因为要批量处理文档,但python_docx不支持doc格式,所以使用Python批量将doc转换为docx

环境

  • Mac

  • Python 3.5.6

安装LibreOffice

下载地址

安装后将软件放到Applications中

执行测试

执行命令后目录下将会出现一个1146.docx的文件,然后就可以使用python_docx处理word了

$ /Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to docx /Users/see/documents/1146.doc --outdir /Users/see/documents/

Python批量执行

# -*- coding: utf-8 -*
import os
import subprocess

source = "/Users/see/documents/doc"
dest = "/Users/see/documents/docx"
g = os.walk(source)

for path,dir_list,file_list in g:
for file_name in file_list:
file = (os.path.join(path, file_name) )
print (file)
output = subprocess.check_output(["/Applications/LibreOffice.app/Contents/MacOS/soffice","--headless","--convert-to","docx",file,"--outdir",dest])

执行脚本即可