Word 2007 files are really zipped collections of mostly XML files. XML is not tolerant of file corruption and from the errors generated it appears that Word 2007 is using a fairly corrupt intolerant XML reading algorithm to even salvage text from corrupt Word 2007 docx files. Damaged DOCX2TX uses an unzipper which is tolerant of XML file corruption and uses Perl coding to extract the text from the document.xml file where all of the unformatted text resides in a docx file. Since this Perl coding does not use a standard XML reading applet or module but simply removes the hypertext around the text, the result is more less perfectly extracted text until that part of the document.xml file where the corruption starts, is reached. Word 2007 on the other hand appears to return return no results if it encounters any errors at all in the document.xml file. The program has a Perl/Tk GUI front end.
Damaged DOCX2TXT 1.0
**Damaged DOCX2TXT License**
This software is based on on the docx2txt project of Sandeep Kumar which is released GNU Public License. This is also released with that license.
**CAKE3 License**
As this component use SharpZipLib, to avoid confusion, the license of CAKE3 is same as SharpZipLib license, the following quoted from their web site.
The library is released under the GPL with the following exception:
Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination.
As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.
Bottom line In plain English this means you can use this library in commercial closed-source applications.
***
.Net Version 2
Please, enter verification code on the image below:




User comments