FAQ Database Discussion Community


PHPWord corrupted file?

php,ms-word,openxml,docx,phpword
My basic PHPWord setup is working. This is my code: <?php require_once 'PhpWord/Autoloader.php'; \PhpOffice\PhpWord\Autoloader::register(); function getEndingNotes($writers) { $result = ''; // Do not show execution time for index if (!IS_INDEX) { $result .= date('H:i:s') . " Done writing file(s)" . EOL; $result .= date('H:i:s') . " Peak memory usage: "...

POI ignoring some snippets of docx

java,xml,apache-poi,docx
I'm trying to use this code (POI 3.11) to extract text from a docx file: XWPFDocument doc = new XWPFDocument(OPCPackage.open("sample.docx")); for (XWPFParagraph p : doc.getParagraphs()) { List<XWPFRun> runs = p.getRuns(); if (runs != null) { for (XWPFRun r : runs) { String text = r.getText(0); System.out.println(text); } } } Here...

XSLT read Table of contents from docx document.xml

xslt,docx,toc
I am trying to retrieve the TOC from docx's document.xml file using XSLT Here is my XSLT: <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sap="http://www.sap.com/sapxsl" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" exclude-result-prefixes="w" version="2.0"> <xsl:output indent="yes" method="xml"/> <xsl:strip-space elements="*"/> <xsl:template match="w:sdt"> <xsl:element name="root"> <xsl:attribute name="label"> <xsl:value-of...

Jinja2 for word templating

python,jinja2,template-engine,docx
I would like to use jinja2 for word templating like mentioned is this short article. The problem I'm facing is as follows, if I put {{title}} in my word-file the resulting xml can look like this: <w:r><w:t>{{</w:t></w:r><w:proofErr w:type="gramStart"/><w:r><w:t>title</w:t></w:r><w:proofErr w:type="gramEnd"/><w:r><w:t>}}</w:t></w:r></w:p> so it is impossible for jinja to replace this accordingly. Is...

Regex to match xml tag with multiple attributes

regex,docx
I'm trying to find a regular expressino that can match the tag <w:proofErr .... />. The regex101 link: regex101 The original string is: <w:pPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:autoSpaceDE w:val="0"/><w:autoSpaceDN w:val="0"/><w:adjustRightInd w:val="0"/><w:spacing w:after="0" w:line="240" w:lineRule="auto"/><w:rPr><w:rFonts w:cs="SerifGothicStd-Bold"/><w:b/><w:bCs/><w:sz w:val="24"/><w:szCs...

DocX c# libary changing page format on InsertSectionPageBreak

c#,ms-word,docx
I am working whit DocX a library for creating Microsoft .docx files inside c#. https://docx.codeplex.com/ I am loading an preexisting file into the program and then adding content. It was the easiest way to get a pre defined header. I noticed that if i use InsertSectionPageBreak the page format of...

Opening Word (.docx) files on a Windows Form C#

c#,winforms,webbrowser-control,docx
I'm trying to make my program have the ability to display a Microsoft Word file on a form but not having any luck in doing so. I want to be able to open the file and display it on the form as a Read-Only. So basically just display it's contents....

set case of content bound via content controls in docx

ms-word,docx,docx4j
I have a docx file that contains a custom part and a web page that collects input from the user to populate that custom part. One of my "variables" is used multiple times in the document. In some cases, I need it to appear in ALL CAPS. In most cases,...

Asciidoc and math equation does not render on .docx

docx,doc,pandoc,asciidoc,asciidoctor
I'm trying to convert .adoc files to .docx Actually I'm using: asciidoctor file.adoc -o file.html pandoc -s -S file.html -o output.docx My math equations or symbols inside .adoc are equal to: latexmath:[$\phi$] and more text as Inline test latexmath:[$\sin(x)$] It returns after conversion to docx the strange lines inside .docx:...

Cannot count number of characters in a docx file generated from a not empty XHTML

java,jaxb,xhtml,docx,docx4j
I implemented a XHTML converter to DocX using DocX4J. It creates the DocX file without problems. To finish my task I decided to implement a simple test. The test consists in counting the number os chars in the DocX created and then comparing it with the already known number of...

PHP DOM Document ignore whitespaces appending nodeValue

php,dom,docx
I'm working with MS Office Word document through PHP and DOM. I am adding paragraphs to my document. And now I have to make the part of string bold (it becomes from database and I'm unable to change it). Like this: The part of string is bold really. What I...

How to insert line break into Word (docx) document using OpenXMLPowerTools?

c#,.net,openxml,docx,openxml-sdk
I'm writing a library which generates Word documents based on a template. Some text needs to be replaced with another text. Everything seems to be working, there is a TextReplacer class which may perform replacements. The things become worse when I need to replace a single-line part of text with...

Using WordprocessingDocument error: Unable to create mutex

c#,.net,openxml,docx,openxml-sdk
I'm using this simple pattern to create a docx file in an ASP.NET app: var outputFileName = "creating some file name here..."; var outputFile = string.Format("~/App_Data/files/{0}.docx", outputFileName); // creating a file stream to write to var outputStream = new FileStream(HttpContext.Current.Server.MapPath(outputFile), FileMode.OpenOrCreate); // creating the default template using (var sr =...

batch file for runing a java command

batch-file,docx
I have to run the following command for hundreds of .docx files in a directory in a windows in order to convert them to .txt. java -jar tika-app-1.3.jar -t somedocfile.doc > converted.txt I was wondering if there is any automatic way such as writing a ".bat" file to do this....

How to get rmarkdown and knitr to use em-dash with .docx files?

knitr,docx,rmarkdown
I am new to using rmarkdown and knitr to produce .docx word documents. The rmarkdown reference guide states that using -- gives an en-dash, and --- gives an em-dash. If I knit my .Rmd file to HTML then the en-dashes and em-dashes are working correctly, however when knitting to a...

Changing XML of docx to load images from web

xml,image,ms-word,docx
Basically, my docx size is very big and it has many images and I wanted to reduce the size of it, I tried everything, compressed the images and etc, so from 25MB I got it to 13MB. But I wanted to lower it more so I was playing around and...

Unable to connect the LibreOffice on port 2002?

docx,libreoffice,doc,libreoffice-base,docverter
I am using the docvert 5.1 for convert .doc to html.When i run the "Tests (run all)" during I am getting the error message under the following parts: " ✘Unable to run tests due to exception. Failed to connect to LibreOffice on port 2002. Connector : couldn't connect to socket...

How to find a list in docx using python?

python,docx,python-docx
I'm trying to pull apart a word document that looks like this: 1.0 List item 1.1 List item 1.2 List item 2.0 List item It is stored in docx, and I'm using python-docx to try to parse it. Unfortunately, it loses all the numbering at the start. I'm trying to...

Apply a TableStyle to a Word Table

c#,ms-word,openxml,docx
Trying to style a table using a predefined style but nothing is working. I've tried with a a newly created document and one created from a saved template. Using the SDK Productivity tool I can see the style is there in the template but it's not being applied. I've tried...

Converting a docx containing a chart to PDF

java,pdf,charts,docx,docx4j
I've got a docx4j generated file which contains several tables, titles and, finally, an excel-generated curve chart. I have tried many approaches in order to convert this file to PDF, but did not get to any successful result. Docx4j with xsl-fo did not work, most of the things included in...

Output Web Page as DOCX?

docx
In the past, if I wanted a web page to display as a .DOC word document, I could do so by doing this in the page load: Response.AddHeader("content-disposition", "attachment;filename=FullDetail.doc") Response.ContentType = "application/vnd.word" I was hoping to output the web page as a .DOCX by doing: Response.AddHeader("content-disposition", "attachment;filename=FullDetail.docx") Response.ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"...

Reading docx files, recognizing and storing italicized text

python,string,docx
How should I go about reading a .docx file with Python and being able to recognize the italicized text and storing it as a string? I looked at the docx python package but all I see is features for writing to a .docx file. I appreciate the help in advance...

Docx generation - reuse

pdf-generation,docx,xdocreport
I'm looking to generate docx and pdf documents in my java application. The best, most cost effective solution seems to be xdocreport - I've started using it and it's good. However, xdocreport doesn't seem to allow reuse of common sections across documents. Eg. I want to create two documents -...

get docx file contents using javascript/jquery

javascript,jquery,docx
wish to open / read docx file using client side technologies (HTML/JS). kindly assist if this is possible . have found a Javascript library named docx.js but personally cannot seem to locate any documentation for it. (http://blog.innovatejs.com/?p=184) the goal is to make a browser based search tool for docx files...

Convert docx to mediawiki and preserve [[Image:]]

converter,mediawiki,docx,libreoffice,soffice
Currently, I'm trying to move a docx to a mediawiki file and preserve the proper filenames in the [[Image:]] tags. For some reason, the proper image file gets swallowed (ie, normally it'd be media/image4.jpg, but instead it's just empty). I've tried extracting the docx and looking at docx/word/_rels/document.xml.rels but I...

Apache POI characters run for .docx

java,api,apache-poi,document,docx
In .doc files, There is a function to get each character in paragraph by using CharacterRun charrun = paragraph.getCharacterRun(k++); and then I can use those character runs to inspect their attributes like if ( charrun.isBold() == true) System.out.print(charrun.text()); or something like that. But with .docx files seems to have no...