Board logo

标题: [转贴] docx2html.vbs [打印本页]

作者: tmplinshi    时间: 2013-3-31 05:10     标题: docx2html.vbs

代码出处: http://svn.greenstone.org/main/t ... s/bin/docx2html.vbs

用法:
  1. docx2html.vbs test.docx c:\test.docx c:\test.html
复制代码
  1. Option Explicit
  2. ' http://www.robvanderwoude.com/vbstech_automation_word.php
  3. ' http://www.nilpo.com/2008/06/windows-scripting/reading-word-documents-in-wsh/ - for grabbing just the text (cleaned of Word mark-up) from a doc(x)
  4. ' http://msdn.microsoft.com/en-us/library/3ca8tfek%28v=VS.85%29.aspx - VBScript Functions (CreateObject etc)
  5. ' http://msdn.microsoft.com/en-us/library/aa220734%28v=office.11%29.aspx - SaveAs Method. Expand "WdSaveFormat" section to see all the default filetypes Office 2003+ can save as
  6. ' Error Handling:
  7. ' http://blogs.msdn.com/b/ericlippert/archive/2004/08/19/error-handling-in-vbscript-part-one.aspx
  8. ' http://msdn.microsoft.com/en-us/library/53f3k80h%28v=VS.85%29.aspx
  9. ' To Do:
  10. ' +1. error output on bad input to this file. And commit.
  11. ' +1b. Active X error msg when trying to convert normal *.doc: only when windows scripting is on and Word not installed.
  12. ' +1c. Make docx accepted by default as well. Changed WordPlugin.
  13. ' 2. Try converting from other office types (xlsx, pptx) to html. They may use other constants for conversion filetypes
  14. ' 3. gsConvert.pl's any_to_txt can be implemented for docx by getting all the text contents. Use a separate subroutine for this. Or use wdFormatUnicodeText as outputformat.
  15. ' 4. Try out this script on Windows 7 to see whether WSH is active by default, as it is on XP and Vista.
  16. ' 5. What kind of error occurs if any when user tries to convert docx on a machine with an old version of Word (pre-docx/pre-Word 2007)?
  17. ' 6. Ask Dr Bainbridge whether this script can or shouldn't replace word2html, since this can launch all versions of word (not just 2007) I think.
  18. ' Unless some commands have changed? Including for other Office apps, in which case word2html would remain the correct program to use for those cases.
  19. ' gsConvert.pl expects error output to go to the console's STDERR
  20. ' for which we need to launch this vbs with "CScript //Nologo" '(cannot use WScript if using StdErr
  21. ' and //Nologo is needed to repress Microsoft logo text output which messes up error reporting)
  22. ' http://www.devguru.com/technologies/wsh/quickref/wscript_StdErr.html
  23. Dim objStdErr, args
  24. Set objStdErr = WScript.StdErr
  25. args = WScript.Arguments.Count
  26. If args < 2 then
  27.   'WScript.Echo Usage: args.vbs argument [input docx path] [output html path]
  28.   objStdErr.Write ("ERROR. Usage: CScript //Nologo " & WScript.ScriptName & " [input office doc path] [output html path]" & vbCrLf)
  29.   WScript.Quit
  30. end If
  31. ' Now run the conversion subroutine
  32. Doc2HTML WScript.Arguments.Item(0),WScript.Arguments.Item(1)
  33. ' In terminal, run as: > docx2html.vbs C:\fullpath\to\input.docx C:\fullpath\to\output.html
  34. ' In terminal, run as: > CScript //Nologo docx2html.vbs C:\fullpath\to\input.docx C:\fullpath\to\output.html
  35. ' if you want echoed error output to go to console (instead of creating a popup) and to avoid 2 lines of MS logo.
  36. ' Will be using WScript.StdErr object to make error output go to stderr of CScript console (can't launch with WScript).
  37. ' http://www.devguru.com/technologies/wsh/quickref/wscript_StdErr.html
  38. Sub Doc2HTML( inFile, outHTML )
  39. ' This subroutine opens a Word document,
  40. ' then saves it as HTML, and closes Word.
  41. ' If the HTML file exists, it is overwritten.
  42. ' If Word was already active, the subroutine
  43. ' will leave the other document(s) alone and
  44. ' close only its "own" document.
  45. '
  46. ' Written by Rob van der Woude
  47. ' http://www.robvanderwoude.com
  48.     ' Standard housekeeping
  49.     Dim objDoc, objFile, objFSO, objWord, strFile
  50.     Const wdFormatDocument                    =  0
  51.     Const wdFormatDocument97                  =  0
  52.     Const wdFormatDocumentDefault             = 16
  53.     Const wdFormatDOSText                     =  4
  54.     Const wdFormatDOSTextLineBreaks           =  5
  55.     Const wdFormatEncodedText                 =  7
  56.     Const wdFormatFilteredHTML                = 10
  57.     Const wdFormatFlatXML                     = 19
  58.     Const wdFormatFlatXMLMacroEnabled         = 20
  59.     Const wdFormatFlatXMLTemplate             = 21
  60.     Const wdFormatFlatXMLTemplateMacroEnabled = 22
  61.     Const wdFormatHTML                        =  8
  62.     Const wdFormatPDF                         = 17
  63.     Const wdFormatRTF                         =  6
  64.     Const wdFormatTemplate                    =  1
  65.     Const wdFormatTemplate97                  =  1
  66.     Const wdFormatText                        =  2
  67.     Const wdFormatTextLineBreaks              =  3
  68.     Const wdFormatUnicodeText                 =  7
  69.     Const wdFormatWebArchive                  =  9
  70.     Const wdFormatXML                         = 11
  71.     Const wdFormatXMLDocument                 = 12
  72.     Const wdFormatXMLDocumentMacroEnabled     = 13
  73.     Const wdFormatXMLTemplate                 = 14
  74.     Const wdFormatXMLTemplateMacroEnabled     = 15
  75.     Const wdFormatXPS                         = 18
  76.     ' Create a File System object
  77.     Set objFSO = CreateObject( "Scripting.FileSystemObject" )
  78.     ' Create a Word object. Exit with error msg if not possible (such as when Word is not installed)
  79. On Error Resume Next
  80.     Set objWord = CreateObject( "Word.Application" )
  81. If CStr(Err.Number) = 429 Then ' 429 is the error code for "ActiveX component can't create object"
  82. ' http://msdn.microsoft.com/en-us/library/xe43cc8d%28v=VS.85%29.aspx
  83. 'WScript.Echo "Microsoft Word cannot be found -- document conversion cannot take place. Error #" & CStr(Err.Number) & ": " & Err.Description & "." & vbCrLf
  84. objStdErr.Write ("ERROR: Windows-scripting failed. Document conversion cannot take place:" & vbCrLf)
  85. objStdErr.Write ("   Microsoft Word cannot be found or cannot be launched. (Error #" & CStr(Err.Number) & ": " & Err.Description & "). " & vbCrLf)
  86. objStdErr.Write ("   For converting the latest Office documents, install OpenOffice and Greenstone's OpenOffice extension. (Turn it on and turn off windows-scripting.)" & vbCrLf)
  87. Exit Sub
  88. End If
  89.     With objWord
  90.         ' True: make Word visible; False: invisible
  91.         .Visible = False
  92.         ' Check if the Word document exists
  93.         If objFSO.FileExists( inFile ) Then
  94.             Set objFile = objFSO.GetFile( inFile )
  95.             strFile = objFile.Path
  96.         Else
  97.             'WScript.Echo "FILE OPEN ERROR: The file does not exist" & vbCrLf
  98.             objStdErr.Write ("ERROR: Windows-scripting failed. Cannot open " & inFile & ". The file does not exist. ")
  99.             ' Close Word
  100.             .Quit
  101.             Exit Sub
  102.         End If
  103.         'outHTML = objFSO.BuildPath( objFile.ParentFolder, _
  104.         '          objFSO.GetBaseName( objFile ) & ".html" )
  105.         ' Open the Word document
  106.         .Documents.Open strFile
  107.         ' Make the opened file the active document
  108.         Set objDoc = .ActiveDocument
  109.         ' Save as HTML -- http://msdn.microsoft.com/en-us/library/aa220734%28v=office.11%29.aspx
  110.         objDoc.SaveAs outHTML, wdFormatFilteredHTML
  111.         ' Close the active document
  112.         objDoc.Close
  113.         ' Close Word
  114.         .Quit
  115.     End With
  116. End Sub
复制代码

作者: hitman    时间: 2013-3-31 10:13

感谢tmplinshi 的搜集与分享




欢迎光临 批处理之家 (http://bbs.bathome.net/) Powered by Discuz! 7.2