This blog is mainly about Java...

Friday, September 17, 2010

Devoxx versus JavaOne

Devoxx or JavaOne?
Thats the question...

It's really not that difficult to choose.
If you are based in Europe (as our company is), then you will for sure get more value for your money attending Devoxx instead of JavaOne.

However, if you look at a technical perspective, then still Devoxx comes on top in my opinion. The opening talk is by Mark Reinhold and one of the last talks, "Java state of the Union" is by no other than James Gosling. (I don't even need to link to him, every Java developer should know who he is), and there are tons of fameous speakers: Brian Goetz, The JavaPosse, Heinz Kabutz, Richard Bair and Roberto Chinnici just to name a few. (No pun intented for the others I didn't mention).

So it shouldn't come as a big surprise that I am also attending Devoxx. If you are going, lemme know and we can hook up!

Sunday, September 12, 2010

Java 7, yet another delay

Mark Reinhold has published a blog stating what has been painfully obvious to everyone following the JDK 7 development: It will yet again be delayed until mid 2012(!)

Mark is further saying that there is an alternative which they are considering, and that "is to take everything we have now, test and stabilize it, and ship that as JDK 7. We could then finish Lambda, Jigsaw, the rest of Coin, and maybe a few additional key features in a JDK 8 release which would ship fairly soon thereafter."

I couldn't agree more. The community has waited too long for Java 7 to come out. There are so many problems in the current Java version, that makes people look around for alternatives in the Java Virtual Machine.
I am certain that if Java 7 will be delayed for yet two more years, then most people by that time will move to other languages such as Scala and Grails, which doesn't have the problems Java has today. 

So, to sum up. Oracle has my vote to ship whatever they have now, and then come with the rest of it with JDK 8.

Monday, August 2, 2010

Migrating from JODConverter 2 to JODConverter 3 and converting PDF to PDF/A

In the previous posting I showed you how you could automate conversions of documents to PDF & PDF/A using JODConverter 2.  

JODConverter 3.0.beta has been out for some time, and even though it is still beta, it is very stable. Maybe even more stable than JODConverter 2.

In this blog posting I will highlight the benefits of JODConverter 3 compared to its predecessor and show you how you can modify your code to create PDF/A documents with JODConverter 3.  
To be able to convert an existing PDF document to PDF/A in OpenOffice.org, you will need to install Sun PDF Import extension!

JODConverter 2 versus 3
JODConverter 3 still uses OpenOffice.org to perform its conversion. It is still a wrapper to the OOo API. It is only a complete rewrite of the JODConverter core library which is much cleaner and easier to use.

Whats new? 
  • No more init script(!) 
    • You don't have to manually start OpenOffice.org as a service anymore. This will be handled automatic.
    • You can even create multiple processes which is useful for multi-core CPU's. Best practise is one process for each CPU core.
  • Automatically restart an OOo instance if it crashes.
    • If for some reason your process crashes, JODConverter will detect this, and restart the process automatic. This was a hassle with JODConverter 2, as you needed to manually do this in Linux.
  • Abort conversions that take too long (according to a configurable timeout parameter)
  • Automatically restart an OOo instance after n conversions (workaround for OOo memory leaks)
Additionally the new architecture will make it easier to use the core JODConverter classes as a generic framework for working with OOo - not just limited to document conversions.
I am sure there will be more features when JODConverter 3 goes final.

Configuration

All you need to do do is point your OpenOffice.org installation to the OfficeManager, and you are good to go.

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .buildOfficeManager().start();

This manager will use the default settings for Task Queue Timeout, Task Execution Timeout, Port Number etc but you can easily change them

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .setTaskExecutionTimeout(240000L)
        .setTaskQueueTimeout(60000L)
        .buildOfficeManager().start();

If you want to utilize piping (Recommended is one process per CPU-core), you will need to set VM argument and point java.library.path to the location of $URE_LIB which on my Ubuntu machine is /usr/lib/ure/lib/
For instance:
-Djava.library.path="/usr/lib/ure/lib"

And then you can change your OfficeManager.

OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
        .setOfficeHome("/usr/lib/openoffice")
        .setConnectionProtocol(OfficeConnectionProtocol.PIPE)
        .setPipeNames("office1","office2") //two pipes
        .setTaskExecutionTimeout(240000L) //4 minutes
        .setTaskQueueTimeout(60000L)  // 1 minute
        .buildOfficeManager().start();



ConverterService3Impl
The following codes performs all the converting. It supports a File or byte[] as input.

This is how you use it:
Lets say you have a PDF file as byte[], and you want to convert this byte to PDF/A as byte.
All you would have to do is call method:



byte[] pdfa = converterService.convertToPDFA(pdfFile);

Similarly, if you have a Document (say a OpenOffice.org writer document) and you want to convert this to PDF you would call the method:

File doc = new File("myDocument.odt");
File pdfDocument = converterService.convert(doc, ".pdf");

Note that you will always get a PDF/A compliant pdf. All you need to do is change the extension from ".pdf" to ".html" and the converter would do the magic.


Here is the source. Please read the comments in the source code if you want to understand it, or just ask in the comment section below.

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.ConnectException;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.ejb.Local;
import javax.ejb.Stateless;

import lombok.Cleanup;

import org.apache.commons.io.FilenameUtils;
import org.apache.commons.io.IOUtils;
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.document.DefaultDocumentFormatRegistry;
import org.artofsolving.jodconverter.document.DocumentFamily;
import org.artofsolving.jodconverter.document.DocumentFormat;
import org.artofsolving.jodconverter.document.DocumentFormatRegistry;

/**
 * This service converts files from one thing to another ie ODT to PDF, DOC to ODT etc
 * @author Shervin Asgari
 *
 */
@Stateless
@Local(ConverterService.class)
public class ConverterService3Impl implements ConverterService {

  private static final String PDF_EXTENSION = ".pdf";
  private static final String PDF = "pdf";  

  // Uncomment these when we want to use them

  // private final int PDFXNONE = 0;
  private final int PDFX1A2001 = 1;
  // private final int PDFX32002 = 2;
  // private final int PDFA1A = 3;
  // private final int PDFA1B = 4; 


  @Logger //Your favourite logger (ie Log4J) could be injected here 
  private Log log;

  public File convert(File inputFile, String extension) throws IOException, ConnectException {
    if (inputFile == null) {
      throw new IOException("The document to be converted is null");
    }

    Pattern p = Pattern.compile("^.?pdf$", Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(extension);
    OfficeDocumentConverter converter;
    
    //If inputfile is a PDF you will need to use another FormatRegistery, namely DRAWING
    if(FilenameUtils.isExtension(inputFile.getName(), PDF) && m.find()) {
      DocumentFormatRegistry formatRegistry = new DefaultDocumentFormatRegistry();
      formatRegistry.getFormatByExtension(PDF).setInputFamily(DocumentFamily.DRAWING);
      converter = new OfficeDocumentConverter(officeManager, formatRegistry);
    } else {
      converter = new OfficeDocumentConverter(officeManager);
    }
    
    String inputExtension = FilenameUtils.getExtension(inputFile.getName());
    File outputFile = File.createTempFile(FilenameUtils.getBaseName(inputFile.getName()), extension);

    try {
      long startTime = System.currentTimeMillis();
      //If both input and output file is PDF
      if (FilenameUtils.isExtension(inputFile.getName(), PDF) && m.matches()) {
        //We need to add the DocumentFormat with DRAW
        converter.convert(inputFile, outputFile, toFormatPDFA_DRAW());
      } else if(FilenameUtils.isExtension(outputFile.getName(), PDF)) {
        converter.convert(inputFile, outputFile, toFormatPDFA());
      } else {
        converter.convert(inputFile, outputFile);
      }
      long conversionTime = System.currentTimeMillis() - startTime;
      log.info(String.format("successful conversion: %s [%db] to %s in %dms", inputExtension, inputFile.length(), extension, conversionTime));

      return outputFile;
    } catch (Exception exception) {
      log.error(String.format("failed conversion: %s [%db] to %s; %s; input file: %s", inputExtension, inputFile.length(), extension, exception, inputFile.getName()));
      exception.printStackTrace();
      throw new IOException("Converting failed");
    } finally {
      //outputFile.deleteOnExit();
      //inputFile.deleteOnExit();
    }
  }
  
  /**
   * Convert pdf file to pdf/a
   * You will need to install OpenOffice extension (pdf viewer) to get it working
   * @param pdf
   * @return Byte array
   * @throws IOException
   */
  public byte[] convertToPDFA(byte[] pdfByte) throws IOException, ConnectException {
    @Cleanup InputStream is = new ByteArrayInputStream(pdfByte);
    File pdf = createFile(is, PDF_EXTENSION);
    log.debug("PDF is: #0 #1", pdf.getName(), pdf.isFile());
    return convert(pdf);
  }

  
  private byte[] convert(File pdf) throws IOException {
    if (pdf == null) {
      throw new IOException("The document to be converted is null");
    }

    File convertedPdfA = convert(pdf, PDF_EXTENSION);
    @Cleanup final InputStream inputStream = new BufferedInputStream(new FileInputStream(convertedPdfA));
    byte[] pdfa = IOUtils.toByteArray(inputStream);
    return pdfa;
  }

  /**
   * Creates a temp file and writes the content of InputStream to it. doesn't close input
   * 
   * @return File
   */
  private java.io.File createFile(InputStream in, String extension) throws IOException {
    java.io.File f = File.createTempFile("tmpFile", extension);
    @Cleanup BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(f));
    IOUtils.copy(in, out);
    return f;
  }
  
  /**
   * This DocumentFormat must be used when converting from document (not pdf) to pdf/a
   * For some reason "PDF/A-1" is called "SelectPdfVersion" internally; maybe they plan to add other PdfVersions later.
   */
  private DocumentFormat toFormatPDFA() {
    DocumentFormat format = new DocumentFormat("PDF/A", PDF, "application/pdf");
    Map<String, Object> properties = new HashMap<String, Object>();
    properties.put("FilterName", "writer_pdf_Export");

    Map<String, Object> filterData = new HashMap<String, Object>();
    filterData.put("SelectPdfVersion", this.PDFX1A2001);
    properties.put("FilterData", filterData);

    format.setStoreProperties(DocumentFamily.TEXT, properties);

    return format;
  }
  
  /**
   * This DocumentFormat must be used when converting from pdf to pdf/a
   * For some reason "PDF/A-1" is called "SelectPdfVersion" internally; maybe they plan to add other PdfVersions later.
   */
  private DocumentFormat toFormatPDFA_DRAW() {
    DocumentFormat format = new DocumentFormat("PDF/A", PDF, "application/pdf");
    Map<String, Object> properties = new HashMap<String, Object>();
    properties.put("FilterName", "draw_pdf_Export");

    Map<String, Object> filterData = new HashMap<String, Object>();
    filterData.put("SelectPdfVersion", this.PDFX1A2001);
    properties.put("FilterData", filterData);

    format.setStoreProperties(DocumentFamily.DRAWING, properties);

    return format;
  }

}


Remember to close the connection when your application is quit/shutdown

Wednesday, May 12, 2010

Automate converting of documents to PDF & PDF/A using JODConverter 2

In this blog post I will be showing a great library for converting existing documents to PDF/A using an OpenSource library called JODConverter.
Note there is nothing that prevents you to convert to normal PDF.

JODConverter leverages OpenOffice.org, which provides arguably the best import/export filters for OpenDocument and Microsoft Office formats available today. Thus, it requires an installation of OpenOffice and it supports all documents which OpenOffice supports.

JODConverter automates all conversions supported by OpenOffice.org, including
  • Microsoft Office to OpenDocument, and viceversa
    • Word to OpenDocument Text (odt); OpenDocument Text (odt) to Word
    • Excel to OpenDocument Spreadsheet (ods); OpenDocument Spreadsheet (ods) to Excel
    • PowerPoint to OpenDocument Presentation (odp); OpenDocument Presentation (odp) to PowerPoint
  • Any format to PDF
    • OpenDocument (Text, Spreadsheet, Presentation) to PDF
    • Word to PDF; Excel to PDF; PowerPoint to PDF
    • RTF to PDF; WordPerfect to PDF; ...
  • And more
    • OpenDocument Presentation (odp) to Flash; PowerPoint to Flash
    • RTF to OpenDocument; WordPerfect to OpenDocument
    • Any format to HTML (with limitations)
    • Support for OpenOffice.org 1.0 and old StarOffice formats
    • ...
JODConverter can be used in many different ways
  • As a Java library, embedded in your own Java application
  • As a command line tool, possibly invoked from your own scripts
  • As a simple web application: upload your input document, select the desired format and download the converted version
  • As a web service, invoked from your own application written in your favourite language (.NET, PHP, Python, Ruby, ...)
JODConverter is open source software released under the terms of the LGPL and can be downloaded from SourceForge.net.

Starting OpenOffice as a service

JODConverter needs to connect to a running OpenOffice.org instance in order to perform the document conversions. This is different from starting the OpenOffice.org program as you would normally do. OpenOffice.org can be configured to run as a service and listen for commands on a TCP port. One way of doing this is to run the following command in Linux: (You only need to change location of the soffice)
/usr/bin/soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -norestore -nofirststartwizard -nologo -headless &
I suggest putting this script in /etc/init.d/ so it will run automatically.
Note that you cannot open OpenOffice.org if you have this service running as headless mode.
If you are running your system on Windows, you can read here for information on how to create a service on Windows.
See the Uno/FAQ on the OpenOffice.org Wiki for more on this topic.

Command-Line Tool (cli like)

You can run JODConverter as cli (command line interface) like program.
To use it as a command line tool, you need to download the 2.2.2 distribution, unpack it, and run it using Java.
To convert a single file specify input and output files as parameters
java -jar lib/jodconverter-cli-2.2.0.jar document.doc document.pdf
To convert multiple files to a given format specify the format using the -f (or --output-format) option and then pass the input files as parameters
java -jar lib/jodconverter-cli-2.2.0.jar -f pdf *.odt

Usage in your Java applications
Using JODConverter in your own Java application is very easy. The following example shows the skeleton code required to perform a one off conversion from a Word document to PDF:
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
 
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
 
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
 
// close the connection
connection.disconnect();

To get convert the same document to PDF/A instead, you need to create a custom DocumentFormat that is of type PDF/A and then send that into the convert method like this:
/**
* Returns DocumentFormat of PDF/A
*/
private DocumentFormat toDocumentFormatPDFA() {
  //These are the different PDF version's you can get. 1 is the default PDF/A
    final int PDFXNONE = 0;
    final int PDFX1A2001 = 1;
    final int PDFX32002 = 2;
    final int PDFA1A = 3;
    final int PDFA1B = 4;
    // create a PDF DocumentFormat (as normally configured in document-formats.xml)
    DocumentFormat customPdfFormat = new DocumentFormat(PORTABEL_FORMAT, PDF_APP, "pdf");

    //now set our custom options
    customPdfFormat.setExportFilter(DocumentFamily.TEXT, "writer_pdf_Export");
    /*
     * For some reason "PDF/A-1" is called "SelectPdfVersion" internally; maybe they plan to add other
     * PdfVersions later.
     */
    final Map<String, Integer> pdfOptions = new HashMap<String, Integer>();
    pdfOptions.put("SelectPdfVersion", PDFX1A2001);
    customPdfFormat.setExportOption(DocumentFamily.TEXT, "FilterData", pdfOptions);
    return customPdfFormat;
}
And then you call the convert method with toDocumentFormatPDFA() as parameter.
converter.convert(inputFile, outputFile, toDocumentFormatPDFA());

Note that this is a very simple example. I do not recommend opening and closing connection for each conversion. You open once the application is started (or the first time you want to convert), and then close the connection when the application shuts down.

Monday, April 5, 2010

Advanced Seam series part 3 of 3: Asynchronous mail sending

Advanced Seam series part 3 of 3: Asynchronous mail sending

In this last series I will be showing how easy you can set up your seam environment to handle asynchronous mail sending. You can even raise other events asynchronously.

Seam makes it very easy to perform work asynchronously from a web request.

Seam layers a simple asynchronous method and event facility over your choice of dispatchers:
  • java.util.concurrent.ScheduledThreadPoolExecutor (by default)
  • The EJB timer service (for EJB 3.0 environments)
  • Quartz
Asynchronous configuration
The default dispatcher, based upon aScheduledThreadPoolExecutor performs efficiently but does not guarantee that the task will ever actually be executed. This is because it does not have a persistence state, meaning it only stores the events in memory. If your application server goes down in between the calls, they will not run.
If you have a task that is not critical that it must be performed, ie a clean up task, or something trivial, then it is no reason not to use the default dispatcher.

However, if you want the guarantee that the task is called you might use the Timer Service.
If you're working in an environment that supports EJB 3.0, you can add the following line to components.xml:
<async:timer-service-dispatcher/>
Then your asynchronous tasks will be processed by the container's EJB timer service. The Timer Service implementation is persistence based, thus you will get some guarantee that the tasks will eventually run.

Finally, your third is to use an Open Source alternative called Quartz (recently acquired by Terracotta). To use Quartz, you will need to bundle the Quartz library JAR (found in the lib directory) in your EAR and declare it as a Java module in application.xml.The Quartz dispatcher may be configured by adding a Quartz property file to the classpath. It must be named seam.quartz.properties.In addition, you need to add the following line to components.xml to install the Quartz dispatcher.
<async:quartz-dispatcher/>
Note that Quartz uses RAMJobStore as default, thus it is not persistence based on default. You will need to configure it to use persistence base.
It is up to the reader to choose whichever asynchronous strategy they see fit.

Sending emails Asynchronously
It is really very easy sending a plain email with Seam. All you have to do is use Seam mail, annotate the method and interface (if you are using EJB's) with @Asynchronous and its done. However, the tricky part is using EL expressions and variables stored in the different Contexts, so that you can produce dynamic emails.

There are two ways you can call a task asynchronously. You can either send an asynchronous event, or as I previously explained, annotate a method with @Asynchronous, and call it normally, ie:
Events.instance().raiseAsynchronousEvent("sendOneTimepassword", user, theOnetimepass);
//Or the normal way
@In MailService mailService;
mailService.sendSupport(sender,supportEmail,supportMessage);
Lets say you have the following Stateless EJB that is responsible for sending emails:

import javax.ejb.Local;
import javax.ejb.Stateless;

import org.jboss.seam.annotations.AutoCreate;
import org.jboss.seam.annotations.In;
import org.jboss.seam.annotations.Name;
import org.jboss.seam.annotations.Observer;
import org.jboss.seam.annotations.async.Asynchronous;
import org.jboss.seam.contexts.Contexts;
import org.jboss.seam.faces.Renderer;

/**
 * This service is responsible for sending emails asynchronously
 * @author Shervin Asgari
 *
 */
@Stateless
@Local(MailService.class)
@Name("mailService")
@AutoCreate
public class MailServiceImpl implements MailService {

    @In(create = true)
    private Renderer renderer;
    
    @Asynchronous
    public void sendSupport(User sender, String supportEmail, String supportMessage) {
        Contexts.getEventContext().set("sender", sender);
        Contexts.getEventContext().set("supportEmail", supportEmail);
        Contexts.getEventContext().set("supportMessage", supportMessage);
        renderer.render("/generic/email-template/support-email-template.xhtml");
    }
    
    @Observer("sendOneTimepassword")
    //@Asynchronous, we dont need to annotate with @Asynchronous if we are raising the event asynchronously
    public void sendOneTimepassword(User emailUser, String oneTimePassword) {
        Contexts.getEventContext().set("emailUser", emailUser);
        Contexts.getEventContext().set("oneTimePassword", oneTimePassword);
        renderer.render("/generic/email-template/onetimepassword-email-template.xhtml");
    }    
}
One of the methods sendSupport() is responsible for sending support messages as email, whilst the other method sendOneTimepassword is used to send generated one time passwords to a user so that they can authenticate to the system. Have you look here for information on how you can set up your system to do exactly that.

The key point to note is the Contexts.getEventContext().set("emailUser", emailUser);. If I would have normally injected the user in the EmailService, it would not work. Since the call is made asynchronously we need to set the variables in the asynchronous context. Thus we can now get hold of them in the email template:


<m:message 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:ui="http://java.sun.com/jsf/facelets"
    xmlns:m="http://jboss.com/products/seam/mail"
    xmlns:f="http://java.sun.com/jsf/core"
    xmlns:h="http://java.sun.com/jsf/html">

    <m:from name="Someone">no-reply@someplace.com</m:from>
    <m:to name="#{emailUser.name}" address="#{emailUser.fromEmail}" />
    <m:subject>#{messages['mail.onetime.heading']}</m:subject>
    <m:body>
        Your one time password is: <i>#{oneTimePassword}</i>
        <p>#{messages['mail.onetime.noreply']}</p>
    </m:body>
</m:message>
Thats really all there is to it! Now your emails will be sent asynchronously.

Labels