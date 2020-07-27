Parser and writer for various word processing doc formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and maximal browser compatibility.
Test files should be placed in the
test_files directory, in the appropriate
subdirectory for the filetype. For example, DOCX files should be placed in
test_files\docx\wordjs and RTF files should be in
test_files\rtf\wordjs.
Every test file should be accompanied by a plain text
.txt representation
whose filename is the original filename appended with
.txt. For example, the
DOCX file
test_files\docx\wordjs\foo.docx pairs with the plain text file
test_files\docx\wordjs\foo.docx.txt
Generating Baselines using Word for Windows
Set-ExecutionPolicy RemoteSigned OR
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass in Powershell (PS) Admin 7.0
.\generate_txt.ps1 .\test_files\EXT_TYPE\FOLDER (ex.
.\generate_txt.ps1 .\test_files\docx\apachepoi)
On first run, if a test file does not have an accompanying
.txt file, the
script will open Word and save the file as plaintext. Word will rapidly open
and close during this process.
The script will not attempt to open Word or try to generate
.txt files if they
already exist. After a clean run, Word should not open on future runs.
The script will halt for documents that are broken in certain ways. Word will
display a prompt, stalling the automated process. Those documents can be
skipped by creating a
.skip file as described below.
Skipping Files
The script will look for files with the
.skip extension and skip processing
the base file. For example, if
test_files\docx\wordjs\Hello.docx.skip exists,
the script will not attempt to process
test_files\docx\wordjs\Hello.docx
When the UI blocks (for example, on a VBA error with
ThisDocument), the
corresponding
.skip file should be created manually. The script merely tests
if the file exists, so the content is immaterial and a single letter suffices.
Generating
.skip files
The script will attempt to open password-protected documents using the password
"WordJS". The script will not halt but it will not generate a text file. Instead,
an output would be written to terminal indicating a skip and will generate a
.skip
when encountered.
Please consult the attached LICENSE file for details. All rights not explicitly granted by the Apache 2.0 License are reserved by the Original Author.
MS-CFB: Compound File Binary File Format
MS-DOC: Word (.doc) Binary File Format
RTF: Rich Text Format