Sponsor
Converting MS Office 2003 documents into PDF
Posted by Tobias on December 16th 2009
Believe it or not, everyone hasn’t got the latest Microsoft Office installed in their environment. That is why I received the task to automate the conversion of documents to PDF. The company which ordered this project always sent their official documents in Word 2003 format to their customers and received a lot of complaints because their customers couldn’t open the documents. PDF is a widely spread format which almost every organization can read and is perfect to use when sending documents. The company wanted an easy way to convert Word documents to PDF from within their CRM (Customer Relationship Management) system. Using Office 2007 this isn’t a very hard thing to do, you just save the document as a PDF. However, this company is still using Office 2003 which doesn’t have PDF support.
So, how do we solve this? Simple, we install one of the great freeware PDF printers available online. Sure this would solve a part of the problem but the users would still have to open the file from within the CRM system, print it using the PDF printer, find the PDF and send it to the customer. The problem with this is that users are lazy so we need a really simple way of converting the files from within the CRM system, preferably by a one button click.
So, how can we do this? Well, by installing a PDF printer half the job is already done. To really automate the process we must:
- Start Word (or some other program)
- Open the document
- Print the document to the PDF printer
- Pick up the newly created PDF and import it into the CRM system
Easy! Right? It would be if the PDF printer had some way of taking a filename as an argument and use it to create the PDF. Most of the PDF printing software I tested did not have this feature, some had it but it would cost you. To solve this we need to know how the PDF printers work. Most of them actually work the same way; by installing a plain Postscript printer driver, creating a postscript file and invoking Ghostscript to convert the postscript into PDF. Sounds simple, right? So why won’t we do this ourselves you say? Good question! Lets!
What we need:
- A Postscript printer driver
- Ghostscript
- Some programming skills
I used C# to implement this solution but I would imagine that almost any language would do. Almost... By using .NET we have access to the Office COM interface which we can use to control Word. I’ll be using MS Word in this example but the principle is the same for other programs like Excel or Powerpoint. I will stop rambling now and give you a code example.
Example code
private void ConvertUsingCutePdf(string sFileName) { ApplicationClass oWord = null; try { if (sFileName.EndsWith(".doc")) { string sPdfFileName = @"c:\ExportedFile.pdf"; string sPsFileName = @"c:\TempPSFile.ps"; //Start Word oWord = new Word.ApplicationClass(); oWord.Visible = false; // Hide the application from the user string sCurrentPrinter = oWord.ActivePrinter; // Remember active printer oWord.ActivePrinter = @"CutePDF Writer"; object fileName = sFileName; object falseValue = false; object oTrue = true; object oFalse = false; object missing = Missing.Value; //Open the document Document doc = oWord.Documents.Open(ref fileName, ref missing, ref oTrue, ref oFalse, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); object copies = "1"; object pages = ""; object range = Word.WdPrintOutRange.wdPrintAllDocument; object items = Word.WdPrintOutItem.wdPrintDocumentContent; object pageType = Word.WdPrintOutPages.wdPrintAllPages; object outputFilename = sPsFileName; //Print the document oWord.ActiveDocument.PrintOut(ref oTrue, ref oFalse, ref range, ref outputFilename, ref missing, ref missing, ref items, ref copies, ref pages, ref pageType, ref oTrue, ref oTrue, ref missing, ref oFalse, ref missing, ref missing, ref missing, ref missing); oWord.Documents.Close(ref oFalse, ref missing, ref missing); oWord.Visible = false; // Hide Word again oWord.ActivePrinter = sCurrentPrinter; // Restore users printer //Convert the postscript file into PDF using Ghostscript string sArgs = string.Format(" -q -dSAFER -dBATCH –dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=\"{0}\" \"{1}\"", sPdfFileName, sPsFileName); System.Diagnostics.ProcessStartInfo startInfo = new ProcessStartInfo(); startInfo.FileName = @"gswin32c"; startInfo.Arguments = sArgs; startInfo.CreateNoWindow = true; startInfo.WindowStyle = ProcessWindowStyle.Hidden; System.Diagnostics.Process proc = Process.Start(startInfo); //Wait 10 sec for the process to complete proc.WaitForExit(10000); if (!proc.HasExited) { proc.Kill(); proc.Dispose(); } //Close Word oWord.Quit(ref oFalse, ref missing, ref missing); oWord = null; //Delete temporary files File.Delete(sPsFileName); } else MessageBox.Show("This program only supports Microsoft Word documents."); } catch (Exception ex) { MessageBox.Show(ex.ToString()); } finally { if (oWord != null) { object oFalse = false; object oMissing = Missing.Value; oWord.Quit(ref oFalse, ref oMissing, ref oMissing); } } }
So there you have it! It would need some better error handling and some other changes to suite your exact needs but it can be a good starting point.
A few notes regarding this code:
- Ghostscript is expected in the system path.
- The Microsoft Word interops must be referenced in the project.
- This code can be extended to support all Office programs by checking the input file type.
- The ActivePrinter variable must be a Postscript printer driver. I used CutePDF because it will produce a postscript file if you tell it to print to a file, which is done using oWord.ActiveDocument.PrintOut(). You can use any Postscript printer driver and get the same result.
Post a comment
| Posted by WNfsQQSnANepaKWTmsd on July 29th 2011 23:25 |
|---|
| This could not possibly have been more heflpul! |
