wiki:OfficePlugin

Format conversion extension for FLOSS office suites

Project Details

Licensing

GPL V2

Purpose of this project

The aim of this project is to create a LibreOffice and Apache OpenOffice extension which contacts a Web service to convert between OpenDocument (.odt, .odp, .ods), Office Open XML (OOXML) (.docx, .pptx, .xlsx) and Microsoft Office binary (.doc, .ppt, .xls) formats. Besides the extension this project includes the creation of the Web service itself. This will use the LibreOffice conversion engine which may be extended in the future to refine the conversion.

Motivation for the project

Although both Office Open XML and OpenDocument are open standards and LibreOffice presents the best conversion engine available it still presents many flaws (check the following figure). Incredibly the conversion between the closed Microsoft Office binary formats and OpenDocument works better due to the past reverse engineering by Sun Microsystems. LibreOffice developers still have a long way to go until a proper conversion engine gets into the hands of office professionals.

conversion errors

This project tries to carter for this issue by allowing users to convert their documents through an external conversion engine hosted as a Web service. Such engine might fork and improve on LibreOffice's engine allowing users to take advantage of a better engine without the need to install a different LibreOffice version. For mobile devices and other systems with poor processing capabilities the usage of an external Web service might speedup the conversion specially in the case of complex documents.

Another advantage of this system is that it can become a basic building block for an analysis tool which can gather anonymous statistics from the documents submitted for conversion. Such statistics might for instance list which attributes (tables, enumerations, images, etc.) are most frequently used, information much useful to prioritize the development of features in the conversion engine.

Project description

As mentioned the project is split into two major modules the extension and the Web service provider.

The LibreOffice and Apache OpenOffice extension is written in Java which seems to be the best language to work with Universal Network Objects (Uno) because Uno's documentation uses mostly examples based on Java and there is a larger code base (examples and working extensions).

The server side software also uses Java to implement a RESTful web service which in turn issues conversion requests to a LibreOffice installation running in server mode.

Both components use Jersey, an implementation of JAX-RS (Java API for RESTful Web Services) in order to build the communication layer.

The development environment used was Eclipse under a GNU/Linux system with the addition of OOEclipse, a plugin to facilitate the development with Uno and LibreOffice's API.

Server API

The server implements a REST API which is published in a WADL (Web Application Description Language) file as the one that follows.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<application xmlns="http://wadl.dev.java.net/2009/02">
    <doc xmlns:jersey="http://jersey.java.net/" jersey:generatedBy="Jersey: 1.13 06/29/2012 05:14 PM"/>
    <grammars/>
    <resources base="http://localhost:9998/">
        <resource path="/convert/{output_format: odt|docx|doc|ods|xlsx|xls|odp|pptx|ppt}">
            <param xmlns:xs="http://www.w3.org/2001/XMLSchema" type="xs:string" style="template" name="output_format"/>
            <method name="POST" id="convert">
                <request>
                    <representation mediaType="multipart/form-data"/>
                </request>
                <response>
                    <representation mediaType="application/vnd.oasis.opendocument.text"/>
                    <representation mediaType="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/>
                    <representation mediaType="application/vnd.ms-word"/>
                    <representation mediaType="application/vnd.oasis.opendocument.spreadsheet"/>
                    <representation mediaType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/>
                    <representation mediaType="application/vnd.ms-excel"/>
                    <representation mediaType="application/vnd.oasis.opendocument.presentation"/>
                    <representation mediaType="application/vnd.openxmlformats-officedocument.presentationml.presentation"/>
                    <representation mediaType="application/vnd.ms-powerpoint"/>
                </response>
            </method>
        </resource>
    </resources>
</application>

To query the Webservice an application just needs to POST an html form including the document to convert as a multipart/form-data media type. This query must be directed to an URL in the form http://<server_address>/convert/<output_format> where output_format can be one of these: odt|docx|doc|ods|xlsx|xls|odp|pptx|ppt. The conversion only takes place if conversion from the input document format to the desired output format is supported by the LibreOffice conversion engine, that is the formats are similar (e.g. both are spreadsheet formats).

For instance a page with the following HTML code will query the Webservice to convert the input document into odt (OpenDocument Text) format.

<html>
	<body>
		<h1>Convert docx to odt</h1>
		<form action="http://localhost:9998/convert/odt" method="post" enctype="multipart/form-data">
			<p>Select a file : <input type="file" name="file" size="45" /></p>
			<input type="submit" value="Upload It" />
		</form>
	</body>
</html>

Extension interface

The conversion functionality inside the office suite can be accessed through the menu File in the menu bar or through a toolbar button as depicted by the images below.

After a click on these the user is just asked for the desired save location and output format.

The Webservice is contacted and the converted document opens in a new window.

Download, build, run and modify

Common dependencies: LibreOffice or Apache OpenOffice and the corresponding SDK (e.g. packages libreoffice-base-core and libreoffice-dev for LibreOffice under Debian based distributions). Subversion to check-out the repositories.

Conversion server

Dependencies: maven2

Download:

svn co http://contribsoft.caixamagica.pt/repo/internals/2012/officeplugin/trunk/conversion_server conversion_server

On pom.xml check if the properties oo.ure.java and oo.ure.java point to the directories containing the office suite jar files.

Compile and run:

mvn compile
mvn exec:java

in the project directory.

To open in Eclipse install m2e - Maven Integration for Eclipse.

In order to change the conversion engine used by the server you just need to modify the method convertAndSave(String outputFormat, String uploadedFileURL, String convertedFileURL) inside com.caixamagica.conversion_server.resources.ConvertFileService class.

Conversion server exec process

This is an old version of the Conversion server which, instead of using UNO to contact a running office instance, instantiates a new LibreOffice process each time a conversion is requested. Thus it consumes a greater amount of resources and is listed here just for testing purposes, for production the current version should be preferred.

Download:

svn co http://contribsoft.caixamagica.pt/repo/internals/2012/officeplugin/tags/conversion_server_exec_proc

Compile and run:

mvn compile
mvn exec:java

in the project directory.

To open in Eclipse install m2e - Maven Integration for Eclipse.

Conversion extension

Download:

svn co http://contribsoft.caixamagica.pt/repo/internals/2012/officeplugin/trunk/conversion_extension conversion_extension

Preparing SDK environment:

source /usr/lib/libreoffice/sdk/setsdkenv_unix.sh

Compile and deploy:

make

To open in Eclipse preferably configure OOEclipse and configure it like this in a Debian based distro:

  • Path to SDK: /usr/lib/libreoffice/sdk
  • Path to OOo installation: /usr/lib/libreoffice/ure-link

If you do not wish to build the extension from source code there is a packaged version available at ConversionExtension.oxt. To change the conversion server address do the following before installing the extension:

echo server_address = http://mysever_address > settings.properties
zip -u ConversionExtension.oxt settings.properties

Planned road-map

1st week (16/07 - 20/07): Research and planning

  • Research the technologies needed for the project.
  • Write first version of this wiki.
  • Setup the development environment.

2nd week (23/07 - 27/07): Web service provider prototype

  • Project planning.
  • Risk identification.
  • Prepare internal presentation.
  • Jersey is built using Maven. Assess if its worth to use this technology and set it up if so.
  • Read REST and JAX-RS documentation.
  • Create a RESTful web service able to receive and store a file, convert it with LibreOffice's engine and send the output file back to the client.

3rd week (30/07 - 03/08): Holidays

4th week (06/08 - 10/08): Extension prototype

  • Read Uno and LibreOffice API documentation.
  • Read OOEclipse documentation and find how to create, compile and run a test extension (e.g. Hello world!).
  • Create a simple extension adding a new entry to the File menu (e.g. "Convert").

5th week (13/08 - 17/08): Integrate conversion into Save as dialog

  • Check whether it's possible to add a new file type into the Save as dialog and call a web service during the save action.
  • If so implement the solution above. If not create a new dialog called "Convert" and implement the web service call there.

6th week (20/08 - 24/08): Add conversion for the main file formats

  • Add functionality in both server and client to support conversion for the most used office file formats.
  • Document and publish the web service REST API.

7th week (27/08 - 31/08): Fine tune and preparation for future extension

  • Check if is possible to run LibreOffice as a background service and issue requests to the conversion engine.
  • Research and document how it's possible to modify LibreOffice's conversion engine. Make a small modification and publish it as an example.
  • Research and document an approach to receive conversion error's reports from users with the possibility to restrict it to a document section instead of the entire document. Find a way to associate such error report with the corresponding web service request and documents and store them for future analysis.

8th week (03/09 - 07/09): Testing, debug, code cleaning and documentation

  • Update the wiki.

9th week (10/09 - 14/09): Documentation and delivery

  • Finish wiki.
  • Prepare final presentation (with demonstration).

Actual road-map

1st week (16/07 - 20/07): Research and planning

  • Research the technologies needed for the project.
  • Write first version of this wiki.
  • Setup the development environment.

2nd week (23/07 - 27/07): Web service provider prototype

  • Project planning.
  • Risk identification.
  • Prepare internal presentation.
  • Jersey is built using Maven. Assess if its worth to use this technology and set it up if so. [It's worth and set it up in the server.]
  • Read REST and JAX-RS documentation. [Just read segments.]
  • Create a RESTful web service able to receive and store a file, convert it with LibreOffice's engine and send the output file back to the client.

3rd week (30/07 - 03/08): Holidays

4th week (06/08 - 10/08): Extension prototype

  • Read Uno and LibreOffice API documentation. [Too big to read as a whole. Opted to follow examples and use the documentation for consultation only.]
  • Read OOEclipse documentation and find how to create, compile and run a test extension (e.g. Hello world!). [Impossible, OOEclipse is broken.]
  • Create, compile and run a test extension through other means. [Done by using ProtocolHandlerAddon SDK Developer's Guide example as base.]

5th week (13/08 - 17/08): Extension prototype

  • Trying to adapt ProtocolHandlerAddon to this projects needs. Messing with Makefile and configuration XML files and found 3rd party bugs related to the jar packager and internal class name.

6th week (20/08 - 24/08): Integrate extension into LibreOffice's GUI

  • Find if it's impossible to add new file types into the Save as dialog. [Impossible.]
  • Find if it's possible to intercept the Alien Warning dialog. [Impossible. Call is hardcoded in guisaveas.cxx]
  • Remove ProtocolHandlerAddon's entry from the Addons menu.
  • Add new entries to the menu bar File menu (e.g. "Convert") and to the toolbar.

7th week (27/08 - 31/08): Add functionalities to the extension

  • Convert only if document is saved in some format. If not warn the user.
  • Show menu and toolbar entries only in the 3 major office suite components (Writer, Calc and Impress).
  • Check if document was modified and save it before conversion.
  • Check current document format and show only the valid conversion options.
  • Found mock-up that runs LibreOffice as a background service and issues requests to the conversion engine.

8th week (03/09 - 07/09): Add functionalities to the extension and server

  • Add functionality in both server and client to support conversion for the most used office file formats.
  • Add icon designed by Teresa.
  • Completed extension description.xml.

9th week (10/09 - 14/09): Testing, code cleaning, documentation and extras

  • Add entry in LibreOffice options menu to set the server URL. [Failed due to persistent bug when packing the jar and registering multiple java classes. Check Bugs.]
  • Re-write server and use LibreOffice as a background service and issue requests to the conversion engine. [Done through the use of UNO services.]
  • Substitute and add licenses notices in source code files.
  • Write building instructions in a README for both server and extension.
  • Document and publish the web service REST API.
  • Update wiki.
  • Prepare final presentation (with demonstration).

TO DO

In both components

  • Mechanism to receive conversion error's reports from users with the possibility to restrict it to a document section instead of the entire document. Find a way to associate such error report with the corresponding web service request and documents and store them for future analysis.

Conversion server

  • Improve LibreOffice's conversion engine.
  • Document analysis tool to find the most used tags.
  • Don't store the received documents in the file system. Work with input and output streams in memory or use memory mapped files.
  • Add more unit tests.

Conversion extension

  • Show progress bar while the document is converted.
  • Grey out toolbar icon when the document is empty.
  • Add accelerator key shortcut (e.g. Ctrl+Shift+C for conversion).
  • Add help pages.
  • Substitute Makefile build environment for Maven which is better to handle dependencies.
  • Organize class files into packages (bug prone due to the component registration inside the office suite).

Risk Mitigation, Monitoring and Management Plan

Risks Identified

  1. Outdated IDE plugins. There are two IDE plugins which facilitate the development with LibreOffice's API, the OpenOffice.org API Plugin for Netbeans and OOEclipse. The Netbeans plugin has not seen a new version since 2010 though it was the official IDE plugin developed by Sun for OpenOffice.org, being rich in features and documentation (e.g. OpenOffice NetBeans Integration). OOEclipse is almost a single-man project which didn't either got much attention in the last couple of years. It is scarce in features and documentation though it has received minor updates its development version supports LibreOffice. Furthermore the author is still active in the community, answering to some doubts about the plugin. As I have a good measure of hours of experience with Eclipse and none with Netbeans I've decided to stick with OOEclipse for now, which allowed me to setup the development environment much faster.
  2. Don't know the degree to which an extension is able to customizable LibreOffice and as such if it is possible to integrate the new functionality as I envision.

Risk Mitigation plan

  1. During the 4th week, when the LibreOffice extension development starts, I will check whether the Netbeans plugin would allow me greater productivity through continuous consultation of its documentation. If I find it beneficial I will incur in the penalty of installing and Netbeans, its OpenOffice.org plugin and migrate the project to it.
  2. As stated in the 5th week planing I am considering at least two different ways to make the conversion functionality visible to the user. Furthermore as all LibreOffice extensions available online are open-source one can check the solutions found by other extension developers to similar issues.

Risk Management (actual)

  1. After finding the Developer's Guide examples as a good alternative to the malfunctioning OOEclipse I focused all my attention on adapting them to my needs. This task proved to be much more difficult than what I expected and here it would have been wise to test how the OpenOffice.org API Plugin for Netbeans worked. Though later, while testing an example extension built with Netbeans, I found that this development environment is also critically outdated.
  2. Finding a good place where to embed the extension controls into the office suite GUI ended up an easy task due to the wide range of available options.

Interaction with the community / Incentives to collaboration

Publicize the extension and service among administrative professionals and others which use LibreOffice or OpenOffice and have to incur format conversions frequently, for instance employees of Caixa Mágica's clients. Get feedback from them and in the long term check if they prefer the web service conversion over the one embedded in the office suite.

Bugs

Here follows a short list of the most hindering bugs (to my development effort) which I found in 3rd party applications. I'm reporting them here so those who might want to modify this project may avoid them.

  • The MANIFEST.MF file inside a jar file must have a maximum of 72 bytes per line. To accomplish this the jar archiver wraps lines in the middle of words, turning them into garbage. If lines are wrapped manually the jar archiver joins them and repeats the same mistake. If the same property is repeated multiple times only the first entry is taken into account by the Java VM. For instance with a MANIFEST.MF like this:
    Class-Path: lib1.jar
    Class-Path: lib2.jar
    

only lib1.jar will be in the class-path. If the MANIFEST.MF file is supplied alongside the class files instead of through the -m option it will be ignored. Check multiple Class-Path entries in jar Manifest not handled properly Solution: use zip archiver (Info-Zip) to create the jar file.

  • The class ConversionExtension has an internal class which implements most of the extension functionality. Currently such class is named ProtocolHandlerAddonImpl as in the base example used to build this extension. If the name of this class is changed for instance to ConversionExtensionImpl, even after updating all references to it (currently found only in ConversionExtension.xcu), you will get an "This operation is not supported by this operating system." message when clicking the extension buttons.
  • In the jar archive shipped inside the extension archive (OXT) using a central registration class to register multiple classes which implement the desired services does not work. Check the example:
    svn export http://svn.services.openoffice.org/ooo/contrib/sdk/examples/java/OptionsPageDemo
    

and if you are able to run it please contact me. The following chapters of the Developer's Guide might be useful, Write Registration Info Using Helper Method and Providing a Single Factory Using Helper Method.

Mentor information

Company

Caixa Mágica Software

Company description

Caixa Mágica is one of the open source projects with most historical background in Portugal.

Born in a college environment at ISCTE in 2000, has been growing steadily for the last eight years, supported by a set of visions, mission and values and a strategy the help maintaining the focus on open source technologies.

In 2004, a spin-off company started, being held a strong relationship with ADETTI through a contract of shared development. The company Caixa Mágica Software had positive results in 2004 and has been growing 30% each year.

Today, Caixa Mágica has 15 to 20 collaborators distributed along three main areas:

  • Product: engineering team that develops the Linux Caixa Mágica distribution.

Currently, about 900 units are sold each month, spread along online sales, store and special programs.

  • Outsourcing: projects that highly demand open source technologies and where our professionals are an added value.
  • Research: European and National projects that feed technology and competence to other business units. At the moment we have a cycle of three years from the initial research to product availability.

Added to the three main areas, Caixa Mágica has three more areas of smaller dimensions but growing: Training, Professional Services and Appliances.

Mentor

Flávio Moringa

flav...@caixamagica.pt

Trainee details

David Ludovino

davi...@gmail.com

Past experience

The knowledge needed for this project has been mostly acquired for the past 4 years in academic environment. The trainee learnt about Web services and the related technologies (SOAP, WSDL, etc.) in courses such as Distributed Systems and coded in Java for this and several other courses.

Current situation

Information Systems and Computer Engineering MSc student at Instituto Superior Técnico (IST) in Lisboa. With Distributed Systems as major area, Embedded Systems as minor and the dissertation topic "AirSensor – sensor network for aircraft monitoring".

References

Jersey

Jersey User Guide

File upload example in Jersey

LibreOffice and OpenOffice programming

Libreoffice-qa Document conversion engine

OpenOffice.org Developer's Guide

The OpenOffice.org API Project

Debugging an OOo extension with Eclipse

OpenOffice API Programming in Java

How to get Locale info using openoffice java API

OOo save as dialog filter types issue

Extension Path

FileNotFoundException Win XP despite apparently valid path

Last modified 6 years ago Last modified on May 29, 2013, 4:50:10 PM

Attachments (7)