This blog is mainly about Java...

Monday, August 2, 2010

Migrating from JODConverter 2 to JODConverter 3 and converting PDF to PDF/A

In the previous posting I showed you how you could automate conversions of documents to PDF & PDF/A using JODConverter 2.  

JODConverter 3.0.beta has been out for some time, and even though it is still beta, it is very stable. Maybe even more stable than JODConverter 2.

In this blog posting I will highlight the benefits of JODConverter 3 compared to its predecessor and show you how you can modify your code to create PDF/A documents with JODConverter 3.  
To be able to convert an existing PDF document to PDF/A in OpenOffice.org, you will need to install Sun PDF Import extension!

JODConverter 2 versus 3
JODConverter 3 still uses OpenOffice.org to perform its conversion. It is still a wrapper to the OOo API. It is only a complete rewrite of the JODConverter core library which is much cleaner and easier to use.

Whats new? 
  • No more init script(!) 
    • You don't have to manually start OpenOffice.org as a service anymore. This will be handled automatic.
    • You can even create multiple processes which is useful for multi-core CPU's. Best practise is one process for each CPU core.
  • Automatically restart an OOo instance if it crashes.
    • If for some reason your process crashes, JODConverter will detect this, and restart the process automatic. This was a hassle with JODConverter 2, as you needed to manually do this in Linux.
  • Abort conversions that take too long (according to a configurable timeout parameter)
  • Automatically restart an OOo instance after n conversions (workaround for OOo memory leaks)
Additionally the new architecture will make it easier to use the core JODConverter classes as a generic framework for working with OOo - not just limited to document conversions.
I am sure there will be more features when JODConverter 3 goes final.

Configuration

All you need to do do is point your OpenOffice.org installation to the OfficeManager, and you are good to go.

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .buildOfficeManager().start();

This manager will use the default settings for Task Queue Timeout, Task Execution Timeout, Port Number etc but you can easily change them

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .setTaskExecutionTimeout(240000L)
        .setTaskQueueTimeout(60000L)
        .buildOfficeManager().start();

If you want to utilize piping (Recommended is one process per CPU-core), you will need to set VM argument and point java.library.path to the location of $URE_LIB which on my Ubuntu machine is /usr/lib/ure/lib/
For instance:
-Djava.library.path="/usr/lib/ure/lib"

And then you can change your OfficeManager.

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .setConnectionProtocol(OfficeConnectionProtocol.PIPE)
        .setPipeNames("office1","office2") //two pipes
        .setTaskExecutionTimeout(240000L) //4 minutes
        .setTaskQueueTimeout(60000L)  // 1 minute
        .buildOfficeManager().start();



ConverterService3Impl
The following codes performs all the converting. It supports a File or byte[] as input.

This is how you use it:
Lets say you have a PDF file as byte[], and you want to convert this byte to PDF/A as byte.
All you would have to do is call method:



byte[] pdfa = converterService.convertToPDFA(pdfFile);

Similarly, if you have a Document (say a OpenOffice.org writer document) and you want to convert this to PDF you would call the method:

File doc = new File("myDocument.odt");
File pdfDocument = converterService.convert(doc, ".pdf");

Note that you will always get a PDF/A compliant pdf. All you need to do is change the extension from ".pdf" to ".html" and the converter would do the magic.


Here is the source. Please read the comments in the source code if you want to understand it, or just ask in the comment section below.

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.ConnectException;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.ejb.Local;
import javax.ejb.Stateless;

import lombok.Cleanup;

import org.apache.commons.io.FilenameUtils;
import org.apache.commons.io.IOUtils;
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.document.DefaultDocumentFormatRegistry;
import org.artofsolving.jodconverter.document.DocumentFamily;
import org.artofsolving.jodconverter.document.DocumentFormat;
import org.artofsolving.jodconverter.document.DocumentFormatRegistry;

/**
 * This service converts files from one thing to another ie ODT to PDF, DOC to ODT etc
 * @author Shervin Asgari
 *
 */
@Stateless
@Local(ConverterService.class)
public class ConverterService3Impl implements ConverterService {

  private static final String PDF_EXTENSION = ".pdf";
  private static final String PDF = "pdf";  

  // Uncomment these when we want to use them

  // private final int PDFXNONE = 0;
  private final int PDFX1A2001 = 1;
  // private final int PDFX32002 = 2;
  // private final int PDFA1A = 3;
  // private final int PDFA1B = 4; 


  @Logger //Your favourite logger (ie Log4J) could be injected here 
  private Log log;

  public File convert(File inputFile, String extension) throws IOException, ConnectException {
    if (inputFile == null) {
      throw new IOException("The document to be converted is null");
    }

    Pattern p = Pattern.compile("^.?pdf$", Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(extension);
    OfficeDocumentConverter converter;
    
    //If inputfile is a PDF you will need to use another FormatRegistery, namely DRAWING
    if(FilenameUtils.isExtension(inputFile.getName(), PDF) && m.find()) {
      DocumentFormatRegistry formatRegistry = new DefaultDocumentFormatRegistry();
      formatRegistry.getFormatByExtension(PDF).setInputFamily(DocumentFamily.DRAWING);
      converter = new OfficeDocumentConverter(officeManager, formatRegistry);
    } else {
      converter = new OfficeDocumentConverter(officeManager);
    }
    
    String inputExtension = FilenameUtils.getExtension(inputFile.getName());
    File outputFile = File.createTempFile(FilenameUtils.getBaseName(inputFile.getName()), extension);

    try {
      long startTime = System.currentTimeMillis();
      //If both input and output file is PDF
      if (FilenameUtils.isExtension(inputFile.getName(), PDF) && m.matches()) {
        //We need to add the DocumentFormat with DRAW
        converter.convert(inputFile, outputFile, toFormatPDFA_DRAW());
      } else if(FilenameUtils.isExtension(outputFile.getName(), PDF)) {
        converter.convert(inputFile, outputFile, toFormatPDFA());
      } else {
        converter.convert(inputFile, outputFile);
      }
      long conversionTime = System.currentTimeMillis() - startTime;
      log.info(String.format("successful conversion: %s [%db] to %s in %dms", inputExtension, inputFile.length(), extension, conversionTime));

      return outputFile;
    } catch (Exception exception) {
      log.error(String.format("failed conversion: %s [%db] to %s; %s; input file: %s", inputExtension, inputFile.length(), extension, exception, inputFile.getName()));
      exception.printStackTrace();
      throw new IOException("Converting failed");
    } finally {
      //outputFile.deleteOnExit();
      //inputFile.deleteOnExit();
    }
  }
  
  /**
   * Convert pdf file to pdf/a
   * You will need to install OpenOffice extension (pdf viewer) to get it working
   * @param pdf
   * @return Byte array
   * @throws IOException
   */
  public byte[] convertToPDFA(byte[] pdfByte) throws IOException, ConnectException {
    @Cleanup InputStream is = new ByteArrayInputStream(pdfByte);
    File pdf = createFile(is, PDF_EXTENSION);
    log.debug("PDF is: #0 #1", pdf.getName(), pdf.isFile());
    return convert(pdf);
  }

  
  private byte[] convert(File pdf) throws IOException {
    if (pdf == null) {
      throw new IOException("The document to be converted is null");
    }

    File convertedPdfA = convert(pdf, PDF_EXTENSION);
    @Cleanup final InputStream inputStream = new BufferedInputStream(new FileInputStream(convertedPdfA));
    byte[] pdfa = IOUtils.toByteArray(inputStream);
    return pdfa;
  }

  /**
   * Creates a temp file and writes the content of InputStream to it. doesn't close input
   * 
   * @return File
   */
  private java.io.File createFile(InputStream in, String extension) throws IOException {
    java.io.File f = File.createTempFile("tmpFile", extension);
    @Cleanup BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(f));
    IOUtils.copy(in, out);
    return f;
  }
  
  /**
   * This DocumentFormat must be used when converting from document (not pdf) to pdf/a
   * For some reason "PDF/A-1" is called "SelectPdfVersion" internally; maybe they plan to add other PdfVersions later.
   */
  private DocumentFormat toFormatPDFA() {
    DocumentFormat format = new DocumentFormat("PDF/A", PDF, "application/pdf");
    Map<String, Object> properties = new HashMap<String, Object>();
    properties.put("FilterName", "writer_pdf_Export");

    Map<String, Object> filterData = new HashMap<String, Object>();
    filterData.put("SelectPdfVersion", this.PDFX1A2001);
    properties.put("FilterData", filterData);

    format.setStoreProperties(DocumentFamily.TEXT, properties);

    return format;
  }
  
  /**
   * This DocumentFormat must be used when converting from pdf to pdf/a
   * For some reason "PDF/A-1" is called "SelectPdfVersion" internally; maybe they plan to add other PdfVersions later.
   */
  private DocumentFormat toFormatPDFA_DRAW() {
    DocumentFormat format = new DocumentFormat("PDF/A", PDF, "application/pdf");
    Map<String, Object> properties = new HashMap<String, Object>();
    properties.put("FilterName", "draw_pdf_Export");

    Map<String, Object> filterData = new HashMap<String, Object>();
    filterData.put("SelectPdfVersion", this.PDFX1A2001);
    properties.put("FilterData", filterData);

    format.setStoreProperties(DocumentFamily.DRAWING, properties);

    return format;
  }

}


Remember to close the connection when your application is quit/shutdown

43 comments:

Krazy 'Em said...

Hi, first of all, thanks for your code snippet, it was very useful!

However, I would share one related issue:
I modified your convertToPDFA method so I can export whatever file extension I want to PDF, by adding an extra parameter called format, and refactoring some internal private methods (I won't get into details because this textbox is so little).

The fact is that I got an exception like the following:

org.artofsolving.jodconverter.office.OfficeException: could not load document: tmpFile4044091585053733490.doc

I solved this problem by closing the BufferedOutputStream you use in the private createFile method, such as:

IOUtils.copy(in, out);
out.close();

I hope this to be useful for someone getting in trouble.

Regards

Shervin Asgari said...

Is it this method you mean?

private java.io.File createFile(InputStream in, String extension) throws IOException {
java.io.File f = File.createTempFile("tmpFile", extension);
@Cleanup BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(f));
IOUtils.copy(in, out);
return f;
}

Actually, the @Cleanup here is automatically closing the BufferedOutputStream. Have a look here: http://projectlombok.org/features/Cleanup.html

Krazy 'Em said...

Yes, I mean that method, but I made a mistake.

My project is Spring-based so I removed EJB annotations and I also removed @Cleanup by mistake ¬¬'

I'm so sorry for making you spend your time on this.

Thank you for you response!

Shervin Asgari said...

No worries :-)

Krazy 'Em said...

Hello again Shervin,

I've been trying to set additional export options to your toFormatPDFA method as follows:

filterData.put(”EncryptFile”, Boolean.TRUE);
filterData.put(”DocumentOpenPassword”, “1234″);
filterData.put(”Changes”, 0);
filterData.put(”EnableCopyingOfContent”, Boolean.FALSE);
filterData.put(”Printing”, 0);

properties.put(”FilterData”, filterData);
format.setStoreProperties(DocumentFamily.TEXT, properties);

The file encription is working so the PDF viewer asks for me to enter a password, but other options such as Changes, EnableCopyingOfContent and Printing (described in the table from http://www.artofsolving.com/node/18)
still not work.

Have you faced with this?

Thanks in advance.

Shervin Asgari said...

Hi.

Sorry, I have not encountered this. You should ask in the jodconverter forums, or the openoffice forum, or look in the openoffice documentation.

So I am guessing, if you remove the encryption bit, then the other options work? If so, it is either a bug or feature, and it probably comes from openoffice and not jodconverter. So best to start at openoffice

Krazy 'Em said...

Thanks for your answer. I already posted that issue in the Jodconverter google group before, but I haven't got an answer yet.

There is also previous related posts, but they refer to JODConverter v2.

I also tried to remove the encryption (actually I added encryption to try if that feature works at least).

I'm following your advice and checking the OpenOffice forums / docs.

Thanks for your attention.

Fran Díaz said...

I finally figured it out!

It is neccesary to set both RestrictPermissions and PermissionPassword as well in order to make security restrictions work:

filterData.put("RestrictPermissions", Boolean.TRUE); filterData.put("PermissionPassword", "whatever");

Regards.

Shervin Asgari said...

Glad you sort it out...

Anonymous said...

Hi Shervin, thank you for your example but i´m getting the following exception: "org.artofsolving.jodconverter.office.OfficeException: unsupported conversion" trying pdf to pdf/a.

Can you help me?

P.D.: i´m using jodconverter-core-3.0 and OpenOffice.org 3

Shervin Asgari said...

I need some more information before I am able to help.

Paste a stacktrace to pastebin.com and paste the link here, also show what code you are executing which is throwing this exception. Are you following my example to the letter? Or have you changed anything?

JJ said...

Thanks for your soon answer.

The changes I made were to remove the ejb annotations and pass the OfficeManager to the methods.

I´m using it as a java project with a class with a main method.

The important code is:

OfficeManager officeManager = new DefaultOfficeManagerConfiguration().setOfficeHome("C:\\Program Files\\OpenOffice.org 3").buildOfficeManager();

officeManager.start();

byte[] pdfa = conversor.convertToPDFA(pdfBytes, officeManager);

JJ said...

The exception is:

jodconverter.office.OfficeException: unsupported conversion
at jodconverter.AbstractConversionTask.storeDocument(AbstractConversionTask.java:114)
at jodconverter.AbstractConversionTask.execute(AbstractConversionTask.java:64)
at jodconverter.office.PooledOfficeManager$2.run(PooledOfficeManager.java:92)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:735)
java.io.IOException: Converting failed: unsupported conversion
at main.ConverterService3Impl.convert(ConverterService3Impl.java:91)
at main.ConverterService3Impl.convert(ConverterService3Impl.java:139)
at main.ConverterService3Impl.convertToPDFA(ConverterService3Impl.java:105)
at main.Main.main(Main.java:36)

Shervin Asgari said...

Thats not the entire stacktrace. I need the entire stacktrace, and please use pastebin.com or something simliar to paste the stacktrace.

JJ said...

Sorry for the stacktrace, all I see I put on http://pastebin.com/5xRg2sFc

If it isn´t enough I can send you the project by email.

If it takes you a long time, don´t worry because it only a personal test.

Thanks a lot for your time.

JJ said...

About the openoffice extensions I looked for "pdf viewer" and got "Sun PDF Import Extension", is it correct??

Maybe the problem is that

Shervin Asgari said...

Yes, you must have that installed before you can convert pdf to pdf/a.

I updated my blog with the information about the Sun pdf viewer extension. Thank you for reminding me about it.

JJ said...

I have the Sun PDF Import Extension installed but the error it´s the same.

Shervin Asgari said...

Make sure you can open a pdf document inside openoffice writer.

JJ said...

Yes, I did it

Alonso said...

Hi Shervin!

I need to convert a .pdf file or .doc file to pdf/a-1.

my laptop has this configuration:
alonso@ubuntu:/tmp$ uname -ra
Linux ubuntu 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:48:22 UTC 2010 i686 GNU/Linux


alonso@ubuntu:/tmp$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Server VM (build 17.1-b03, mixed mode)

openOffice 3.2 installed from ubuntu repos, and Sun Pdf import extension v1.0.1 already installed.

i have a class where i initialize officeManager:

this is the constructor:

public DMDocumentManager(
org.mju.gendoc.pool.DocumentParserPool documentParserPool) {
logger.info("constructor DMDocumentManager...");
this.documentParserPool = documentParserPool;
try {
if (this.officeManager == null) {
this.officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("/usr/lib/openoffice")
.setConnectionProtocol(OfficeConnectionProtocol.SOCKET)
.setTaskExecutionTimeout(30000L)
.setPortNumbers(8100,8200, 8300, 8400, 8500).buildOfficeManager();
officeManager.start();
this.converterService = new ConverterService3Impl();
}
} catch (Throwable th) {
logger.error("ERROR al iniciar el office manager..."
+ th.getMessage());
th.printStackTrace();
}
logger.info("END constructor DMDocumentManager...");
}



In the same class i have this method:

public OOoOutputStream crearPDFA_v3_beta(OOoInputStream inputStream,
String generatedFileName) throws Exception {

byte[] aBytesDoc = new byte[32768];
while (true) {
int n = inputStream.read(aBytesDoc);
if (n < 0)
break;
}
byte[] aBytesPDFA = this.converterService.convertToPDFA(aBytesDoc,
this.officeManager);
OOoOutputStream ooos = new OOoOutputStream();
ooos.write(aBytesPDFA);
return ooos;
}

and finally my junit test:

public void testGeneratePDFA() {

// te esta dando un error 2074 o algo asi...
logger.info("INIT testGeneratePDFA...");
String generatedFileName = "generated-poolA.pdf";
OOoInputStream in=null;
try {
InputStream fichero = new FileInputStream( "/home/alonso/gendoc/files/generated-pool.pdf" );
in = new OOoInputStream(ManejadorInputStream.getBytes(fichero));
OutputStream doc = serviceImpl.generatePDF_A(in,generatedFileName);
assertNotNull(doc);
logger.info("documento PDF/A-1 generado usando el pool de servidores OO..." + doc);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
assertNotNull(null);
} catch (RemoteException e) {
logger.info("REMOTE exception capturado en el test al construir el PDF/A-1...");
assertNotNull(null);
} catch(Exception e) {
logger.info("Excepcion generica capturada en el test...");
assertNotNull(null);
}
logger.info("END testGeneratePDFA...");
}

if i run the test i get an exception like this:

http://pastebin.com/yDVmXH8s

the temp file that indicates the problem is the original pdf allocated to /tmp directory!

and if i stop execution via breakpoint i may see that tmp file via pdfcube or similar viewer.

I hope you could help me because i m frustated!

I think that i ve already answer you in google groups!

Alonso said...

Hi Shervin!

I need to convert a .pdf file or .doc file to pdf/a-1.

my laptop has this configuration:
alonso@ubuntu:/tmp$ uname -ra
Linux ubuntu 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:48:22 UTC 2010 i686 GNU/Linux


alonso@ubuntu:/tmp$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Server VM (build 17.1-b03, mixed mode)

openOffice 3.2 installed from ubuntu repos, and Sun Pdf import extension v1.0.1 already installed.

i have a class where i initialize officeManager:

this is the constructor:

public DMDocumentManager(
org.mju.gendoc.pool.DocumentParserPool documentParserPool) {
logger.info("constructor DMDocumentManager...");
this.documentParserPool = documentParserPool;
try {
if (this.officeManager == null) {
this.officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("/usr/lib/openoffice")
.setConnectionProtocol(OfficeConnectionProtocol.SOCKET)
.setTaskExecutionTimeout(30000L)
.setPortNumbers(8100,8200, 8300, 8400, 8500).buildOfficeManager();
officeManager.start();
this.converterService = new ConverterService3Impl();
}
} catch (Throwable th) {
logger.error("ERROR al iniciar el office manager..."
+ th.getMessage());
th.printStackTrace();
}
logger.info("END constructor DMDocumentManager...");
}



In the same class i have this method:

public OOoOutputStream crearPDFA_v3_beta(OOoInputStream inputStream,
String generatedFileName) throws Exception {

byte[] aBytesDoc = new byte[32768];
while (true) {
int n = inputStream.read(aBytesDoc);
if (n < 0)
break;
}
byte[] aBytesPDFA = this.converterService.convertToPDFA(aBytesDoc,
this.officeManager);
OOoOutputStream ooos = new OOoOutputStream();
ooos.write(aBytesPDFA);
return ooos;
}

and finally my junit test:

public void testGeneratePDFA() {

// te esta dando un error 2074 o algo asi...
logger.info("INIT testGeneratePDFA...");
String generatedFileName = "generated-poolA.pdf";
OOoInputStream in=null;
try {
InputStream fichero = new FileInputStream( "/home/alonso/gendoc/files/generated-pool.pdf" );
in = new OOoInputStream(ManejadorInputStream.getBytes(fichero));
OutputStream doc = serviceImpl.generatePDF_A(in,generatedFileName);
assertNotNull(doc);
logger.info("documento PDF/A-1 generado usando el pool de servidores OO..." + doc);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
assertNotNull(null);
} catch (RemoteException e) {
logger.info("REMOTE exception capturado en el test al construir el PDF/A-1...");
assertNotNull(null);
} catch(Exception e) {
logger.info("Excepcion generica capturada en el test...");
assertNotNull(null);
}
logger.info("END testGeneratePDFA...");
}

if i run the test i get an exception like this:

http://pastebin.com/yDVmXH8s

the temp file that indicates the problem is the original pdf allocated to /tmp directory!

and if i stop execution via breakpoint i may see that tmp file via pdfcube or similar viewer.

I hope you could help me because i m frustated!

I think that i ve already answer you in google groups!

Shervin Asgari said...

Hi.

I am very busy at the moment. I don't have time to check this out right now.

But as soon as I get some time I will have a look.

Alonso said...

ok, thank you shervin!

Alonso said...

sorry for the double post!

Dula said...

hi shervin,

i am only using the following methods,

private void convert_PDF_A(File inputFile){

OfficeManager officeManager =null;
try{
officeManager = new DefaultOfficeManagerConfiguration().buildOfficeManager();
officeManager.start();

DocumentFormatRegistry formatRegistry = new DefaultDocumentFormatRegistry();
formatRegistry.getFormatByExtension("pdf").setInputFamily(DocumentFamily.DRAWING);
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager, formatRegistry);

File outputFile = new File("E:/L4_project/sample/file_format_pdfA.pdf");

converter.convert(inputFile, outputFile, toFormatPDFA_DRAW());


}catch(Exception e){
e.printStackTrace();

}finally {
officeManager.stop();
}

}

private DocumentFormat toFormatPDFA_DRAW(){

DocumentFormat format = new DocumentFormat("PDF/A", "pdf", "application/pdf");

Map properties = new HashMap();

properties.put("FilterName", "draw_pdf_Export");

Map filterData = new HashMap();

filterData.put("SelectPdfVersion", this.PDFX1A2001);

properties.put("FilterData", filterData);

format.setStoreProperties(DocumentFamily.DRAWING, properties);


return format;

}

but I got bellow exception
org.artofsolving.jodconverter.office.OfficeException: could not load document: file_format.pdf
at org.artofsolving.jodconverter.AbstractConversionTask.loadDocument(AbstractConversionTask.java:101)
at org.artofsolving.jodconverter.AbstractConversionTask.execute(AbstractConversionTask.java:62)
at org.artofsolving.jodconverter.office.PooledOfficeManager$2.run(PooledOfficeManager.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Dula said...
This comment has been removed by a blog administrator.
Shervin Asgari said...

The error message says it cannot load the
file_format.pdf file.
Can you make sure this file exists and, is readable?

Dula said...

file exists and readable.

Shervin Asgari said...

Well something is wrong from jodconverters point of view.

ie:

File inputFile = new File("E:/L4_project/sample/file_format.pdf.pdf");

System.out.println(inputFile.isFile() + inputFile.canRead());

If these two show correct, then the error is somewhere else in your code.

This comment section is not a place for this type of problems. I suggest you use the jodconverter google groups to get help

Dula said...

oh! sorry about that...

abdelmalek said...

hi Shervin ,
Your posts is very interesting, im actualy trying to use your code but it seems that is not possible to convert from *.pdf to *.pdf/a ...
After reviewing a the source code of the libraries I conclude that the problems comes from the method:
private DocumentFormat toFormatPDFA_DRAW()
I think that is not returning the correct format to execute the next step
converter.convert(inputFile, outputFile, toFormatPDFA_DRAW());

it gives me those errors
org.artofsolving.jodconverter.office.OfficeException: unsupported conversion
at org.artofsolving.jodconverter.AbstractConversionTask.storeDocument(AbstractConversionTask.java:113)
at org.artofsolving.jodconverter.AbstractConversionTask.execute(AbstractConversionTask.java:63)
at org.artofsolving.jodconverter.office.PooledOfficeManager$2.run(PooledOfficeManager.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
java.io.IOException: Converting failed
at ConversorImpl.ConverterService3Impl.convert(ConverterService3Impl.java:100)
at ConversorImpl.ConverterService3Impl.main(ConverterService3Impl.java:242)


Wishing you an happy new year and many thanks!
Malek

Shervin Asgari said...

Hi abdelmalek.

Did code is what I use in production. So it works fine for me.
Are you sure you have installed the pdfimport.oxt file for openoffice? Without this you cannot convert pdf to pdf/a? (http://extensions.services.openoffice.org/project/pdfimport)

To test, try to open a pdf document in OpenOffice

Kai said...

Hi Shervin,

Thanks for your codes about the PDF/A conversion. I was able to use your codes to convert a Microsoft Word document to a PDF/A file in Windows environemnt with OpenOffice 3.3 and Jodconverter 3. However, I could not to get it to work in the Linux environment. I got the following exception:

http://pastebin.com/9HwU2N1V

Do you know anyone has successfully used your codes in the Linux environment? If yes, what versions of the OpenOffice and Jodconverter?

Thanks very much.

Shervin Asgari said...

I am using Linux (Ubuntu & RHEL).
It works there also.

I am on vacation with limited internet access.
I will look at it more closely when I get back next week.

Regards Shervin

Shervin Asgari said...

Hi Kai.

It's extremely hard to make a guess here. The stacktrace is also not so long.
The message does say something along the lines of: exception type not found: vigra.PreconditionViolation.

Seems like it cannot find your exception type vigra.PreconditionViolation. Have you remembered to include this in your distribution/classpath?

srikanth said...

Hi Shervin

Thank you for Your excellent blog which has saved my search for pdf to pdfa conversion. I used your code snippet and tried to ran a PDF conversion to PDF/A. But, whatever i do, i keep on getting "unsupported conversion" exception. I tried other formats through the same code and they were successfull, except pdf to pdf/a!! I am executing this on my windows XP machine. Installed the Open Office 3.3 with Jod Converter 3 and also installed importPDF.oxt extension. I was able to create PDF/A out of PDF through GUI but not through code! Can you please help me with this? I am working on this issue since 1 week and mind is struck in this at this moment!!!

Appreaciate your help in this regard.

Shervin Asgari said...

Hi.

Please use the JODConverter google group for assistance. I might be able to help there.

http://groups.google.com/group/jodconverter

Anonymous said...

nice tutorial!

I just want a note. Hope this can help others with same problem.

After installing pdfimporter, I can use jodconverter to convert pdf to pdf/A without exception. however, the contents of the generated pdf/A screwed up. I got something like:
%PDF-1.4
1.4%Çì¢
5 0 obj
<>
stream
....

after seaching in jodconverter forum, I found out that whether to install the extension for current user or all use cause this issue. details:
http://groups.google.com/group/jodconverter/browse_thread/thread/7d082aa354d15959/eea838360e1a2873?lnk=gst&q=pdf+to+pdf#eea838360e1a2873

now, I installed the extension for all user on Windows and the problem goes away.

Cheers

Raja Sekhar Chaliki said...

another major difference between JODConverter 2 and JODConverter 3 is that version 3 does not support Remote OpenOffice connection.

http://groups.google.com/group/jodconverter/browse_thread/thread/f4505c24545dd395/f3d676314a0ae316#f3d676314a0ae316

101000 said...

Hi, for those who got this exception:


org.artofsolving.jodconverter.office.OfficeException: unsupported conversion
at org.artofsolving.jodconverter.AbstractConversionTask.storeDocument(AbstractConversionTask.java:113)
at org.artofsolving.jodconverter.AbstractConversionTask.execute(AbstractConversionTask.java:63)
at org.artofsolving.jodconverter.office.PooledOfficeManager$2.run(PooledOfficeManager.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
java.io.IOException: Converting failed
at ConversorImpl.ConverterService3Impl.convert(ConverterService3Impl.java:100)
at ConversorImpl.ConverterService3Impl.main(ConverterService3Impl.java:242)


this problem (at least in Windows environment) is due to extension installation permission.
You have to install it running this command in an administrator cmd (please make sure all oo instance are closed before running this command):


C:\Program Files (x86)\OpenOffice.org 3\program>unopkg add --shared oracle-pdfimport.oxt

Sarika S said...

Hi,
I get
org.artofsolving.jodconverter.office.OfficeException: failed to start and connect exception. Please help me. Unfortunately I found this site just now. I stuck with the issue for 2 days.

My code is given below. Please help me..

File inputFile = new File(inputPath);
File outputFile = new File(outputPath);


OfficeManager officeManager = new DefaultOfficeManagerConfiguration().buildOfficeManager();
officeManager.start();

DocumentFormat docFormat = new DocumentFormat("Portable Document Format", "pdf", "application/pdf");
Map map = new HashMap();
map.put("FilterName", "writer_pdf_Export");
PropertyValue[] aFilterData = new PropertyValue[1];
aFilterData[0] = new PropertyValue();
aFilterData[0].Name = "SelectPdfVersion";
aFilterData[0].Value = 1;
map.put("FilterData", aFilterData);
docFormat.setStoreProperties(DocumentFamily.TEXT, map);

OfficeDocumentConverter docConverter = new OfficeDocumentConverter(officeManager);
docConverter.convert(inputFile, outputFile, docFormat);

officeManager.stop();

Shervin Asgari said...

Have you tried the jodconverter google groups? You might get help there.

I don't work on JODConverter anymore, so I can't help.
But I can give you the following tips:

Start small. Take one of the examples from the jodconverter site and run it. Make sure at least your setup works and that you can get something converted.

If that works, then you can try to figure out why your converting doesn't work.

Labels