November 11th, 2008

Extending Eclipse: displaying HTML content from a bundled archive

How is Eclipse Help view implemented? This view is a rich collection of interlinked documents that provide the usual functionality of embedding images and navigating between different pages. In addition, it supports live help actions – hyperlinks that can call Eclipse actions (Java code). How is this implemented?

In a usual setup, the HTML content is stored in the file system. If the requested HTML page contains embedded images, these are stored as separate files, and requested separately by the browser. However, the main contents of Eclipse Help view come from one single file – the org.eclipse.platform.doc.user.jar in the plugins folder. This jar contains around one thousand files (in Eclipse 3.4), including HTML and PNG images. How do these get displayed in the Eclipse Help view?

A straightforward approach would be to “explode” the contents of this jar at runtime (the first time the Help view is activated) and then link directly to the files in the filesystem. This is simple to implement, but would require some bookkeeping to clean the files on closing the workspace. Also, you’re going to pay the performance penalty the first time the files need to be extracted from the archive and written to the disk. Instead, Eclipse 3.4 uses an embedded Jetty HTTP server with custom servlets to serve the content of the Help view.

While the implementation of the Help view is deep inside the internal packages of org.eclipse.help.webapp plugin, this functionality can be recreated by using public extension points (thus ensuring upgradability to the next Eclipse versions). The steps below describe the general setup of an embedded Help HTTP server.

First, you need to specify three extension points in your plugin.xml. For the complete example see the plugin.xml of the org.eclipse.help.webapp plugin. The extension points are:

Jumping a little ahead of time (the complete structure will be explained later), the URLs requested from the HTTP server will look like this:

http://127.0.0.1:12345/primaryID/secondaryID/relative/path/to/your.html

The primaryID is the HTTP context ID specified in the first extension point above. It is also used in the specification of the second and third extension points. The secondaryID allows mapping your content via different servlets (see later). In the simplest example (where all content is coming from the same archive), you will have only one servlet specified in the third extension point. The last identification string is specified on the second extension point – it is the other.info filter on the service selector. This string must be the same as the one set during the initialization of Jetty server (see below).

Your MANIFEST.MF will need two changes. The first one is the Import-Packages section and should have the following entries:

  • javax.servlet
  • javax.servlet.http
  • org.osgi.service.http

The second one is in the Require-Bundle section and should have the following entries:

  • org.eclipse.equinox.http.jetty
  • org.eclipse.equinox.http.servlet
  • org.eclipse.equinox.http.registry

These sections will make sure that you will be able to use the relevant classes in your custom Jetty server and servlets.

The next step is the class that controls the lifecycle of the embedded Jetty server. This is the 127.0.0.1:12345 part in the URL above – it is a local HTTP server that is listening on port 12345. Since this specific port may be taken by another application, we are going to ask Jetty to auto select an available port. The complete implementation of the Jetty configurator can be found in the org.eclipse.help.internal.server.WebappServer class, and the main steps are:

  • Make sure that you’re running only one instance of the HTTP server (instead of creating a new instance for each HTTP request).
  • The http.port parameter should be set to 0 to allow Jetty to auto-select an available port.
  • The context.path parameter should be set to the HTTP context ID (primaryID in the example above).
  • The other.info parameter should be set to the same value as the service selector filter in the second extension point in the plugin.xml.
  • INFO / DEBUG messages of Jetty should be suppressed.
  • To check that Jetty has successfully started, get the org.eclipse.equinox.http.registry bundle and check that its state is RESOLVED.
  • To get the Jetty port (for creating the URLs), get the service reference for org.osgi.service.http.HttpService class and (other.info=yourServiceSelectorFilter) filter. Then, get the http.port property and cast it to Integer.

The next step is to create a custom servlet that will intercept the relevant HTTP requests and load the content from your archive. The complete (and very simple) example can be found in the org.eclipse.help.internal.webapp.servlet.ContentServlet class (registered with the third extension point above). In its init() method it creates a custom connector instance (more info below) and uses it in the doGet() and doPost() methods.

The last piece is the connector itself. It analyzes the incoming request, maps it to the corresponding resource and then transfers the resource contents to the response output stream. The beauty of this connector is that the content can come from anywhere. It can be a local file, a file in an archive, or it can be dynamically generated (corresponding to the requested resource, of course).

While EclipseConnector looks at a variety of sources to get the content and provides a custom error page implementation, the logic is very simple. In a simple example where all the content is coming from one archive, you create a URLClassLoader pointing to that jar (this should be done in the constructor to make the subsequent requests faster) and use the getResourceAsStream passing the trailing portion of the URL (stripping away the host, port, primary ID and secondary ID parts). If the returned InputStream is null, you can return a custom error page.

While the above may sound an overkill, it is quite useful and much more flexible than shipping a huge collection of separate files. With a custom HTTP server and a servlet you can control the contents of error page, fetch the content from multiple locations or even create the content dynamically from a database or another source.