WebSpectra Technical Notes
Craig A. Merlic, Barry C. Fam and Michael M. Miller
Department of Chemistry and Biochemistry
University of California, Los Angeles, California 90095-1569
Display Issues
Given the digital nature of modern spectroscopic data such as NMR and IR spectra, it would seem obvious to use the WWW to present this information. However, while posting of documents and most graphics on the WWW are routine at this point in time, presentation of high resolution spectral data presents special problems. In the WebSpectra site, spectra are presented in the graphic interchange format (GIF), a common format that is directly supported by WWW browsers, while maintaining spectral precision through a dynamic spectrum magnification feature. The important issue on this latter point is that spectral magnification is principally a one-dimensional function, yet no WWW browser or plug-in supports this type of image manipulation.
Dynamic presentation of spectra in the GIF format is made practical through the integration of a common gateway interface (CGI) program and JavaScript into the WWW site. This software addresses two related issues associated with the usage of the GIF image format: size and resolution. The delivery of a complete, high-resolution spectrum in GIF format is generally impractical because the size of the resulting image is unmanageable. On the other hand, presenting a smaller image results in loss of resolution, which may make spectral analysis very difficult. The WebSpectra CGI and JavaScript programs solve the problems of size and resolution by presenting "only what is needed" at any one time. Spectra are displayed first in a low resolution image, then magnifications deliver high-resolution images.
Technical Details of Implementation
The capabilities of the WebSpectra system were implemented through initial acquisition and processing of spectral data, and subsequent development of the WebSpectra interface software. The processing of NMR and IR spectra both include graphics format and layout conversions necessary to store the spectra in a format usable by the CGI and for presentation to the user. The WebSpectra CGI is capable of reading, processing and displaying appropriate information based on the requests of the user as described earlier. JavaScript code is used in combination with CGI to display user requested regions for subplots.
One of the important points in processing and storage of spectra is the preservation of the high-resolution data. This brings up immediate problems in the first step of processing, data acquisition. All of the proton and carbon NMR spectra on the site were obtained using a Bruker ARX400 spectrometer (1). Also, all processing of these spectra, including Fourier transformation (2), phasing, baseline correction, and integration were performed using Brukers proprietary xwinnmr software (3). At this point, the only way to store the fully processed spectra is in Postscript format (4). Due to a limitation in the Bruker software, the maximum resolution possible in this format is only 72 dots per inch (dpi). This results in unacceptable data loss, so the solution was to expand spectra across multiple "pages" within the Postscript file. By spanning a greater area, spectral accuracy is maintained even at the low resolution per unit area.
Due to the nature of the Postscript format, data files are not readily exportable to many other common graphics formats. Therefore, the next processing step is translation of the Postscript data, using Aladdins Ghostscript software (5), into PCX graphics format (6). This conversion is relatively straightforward, although one PCX file is produced for each of the virtual pages in the Postscript file. At this point, the vertical resolution is reduced to 36 dpi to reduce file size and increase computational speed, since extreme precision is not needed in that direction.
Because the specifications for the PCX format are publicly available (6), it was possible to create a series of utilities that read PCX images and manipulate them freely. By this method, the PCX files generated in the last step were converted to a storage format and merged into one large image containing the entire spectrum. For DEPT spectra, the four individual spectra (Oo, 45o, 90o, 135o) were combined into one image at this point as well. Because care was taken to preserve data resolution, each resulting image file has the dimensions of 12672x306 pixels. It is the format of this final image that is particularly important in the speed and practicality of the WebSpectra site for end users; the efficiency of the Postscript and PCX file formats are of secondary importance since they are only used in data processing.
At this point, the processing of 1D NMR and IR spectra converge (Figure), and that of 2D NMR spectra diverges from those two. The manipulation of 2D NMR spectra will be discussed later. Processing of IR spectral data that brings IR spectra to the final storage format is, in comparison to the NMR spectra, much simpler. Data was acquired on a Nicolet 510P FT-IR (7) and saved as raw XY plot data. IR spectra do not require any of the complex secondary processing, for example integration or phasing, involved in NMR spectra. Therefore, it was possible to write another utility program that generates a plot file of this data directly in the final WebSpectra image storage format.
This final image format, in which both types of one-dimensional spectra are stored on the WWW server, is a non-standard, compressed image format designed specifically for WebSpectra. This "WebSpectra format," designated as the SPC format, was implemented so that access by the CGI program is as fast as possible about 5 times faster than GIF or uncompressed formats. The efficiency of the WebSpectra spectrum format, or the inefficiency of other image formats, is due to the fact that the spectral data files are extremely large due to the high horizontal resolution. Other image formats are usually more suitable for smaller images such as pictures, and have problems dealing with the large spectrum images. By fine-tuning the properties of the SPC image format, it has been possible to minimize access speed to the point where the WWW delivery is seamless.
A problem encountered when using graphic formats such as GIF or PCX to store spectrum data is that, in order to access a particular part of an image, the entire image must be decompressed. In most cases, especially at higher magnifications, this is unnecessary, since only a small portion of that data is actually being displayed. Uncompressed image formats do not have this problem but instead suffer from slow disk access times, since uncompressed spectra can reach more than 4 megabytes in size. The SPC format solves these barriers by storing data in sections, such that it is possible to jump immediately to any specific portion of the image, dramatically decreasing access time since only the necessary data is retrieved. In addition, each section is individually compressed, so the SPC format produces files that are approximately 40 times smaller than the raw data. Thus, with the SPC format it is practical not only to deliver spectra at high resolution, but also to store a large number of such spectra in a relatively small amount of disk space.
Returning to 2D NMR spectra such as COSY, these are stored as multiple GIF images. Each spectra is divided into several regions and stored at multiple magnification levels. Because these sections of the 2D spectra are relatively small, it is practical to store them in the GIF rather than SPC format, saving the CPU time costs of on-the-fly processing. In addition, two versions of these images are stored: one normal and one with "inverted colors." The inverted images are used to create the appearance of cursor-selected regions for the user. JavaScript code is embedded in the HTML shown to the user that dynamically replaces the region under the cursor with this second set of "highlighted" sections. The overall effect is of a shaded region moving across the spectrum underneath a moving cursor. When the user selects one of these regions, the CGI handles finding and delivering the new and more magnified spectral section.
Although the spectral images are the most important component of the WebSpectra database, there are several auxiliary files that were also generated during data processing. These files are in plain text and contain necessary information relating to the corresponding spectra such as compound name and formula, solvent(s) used, and for 1D spectra, signal frequencies. The signal frequencies are generated from the Bruker xwinnmr software mentioned in the initial step in NMR processing. These are displayed beside the spectrum and allow calculation of coupling constants as well as location of signals and magnification. Although important, storage of these values is straightforward and much simpler than the spectral images themselves.
Once spectra are stored on the WWW server, the final hurdle is presentation of appropriately formatted images to the user. The presentation of 2D spectra is relatively simple, since there is little dynamic image processing involved. For 1D spectra, the WebSpectra CGI program handles reading of the stored high-resolution spectrum images, processing and creation of a temporary GIF image, and final display on a WWW page. Spectra are read in the WebSpectra image format described earlier. Spectral images are then processed and compressed using the gd GIF library (8). Processing includes calculation and insertion of a ppm calibration line beneath the spectrum, as well as any magnification requested by the user.
The end result of this long series of processing is the pages which make up the WebSpectra site. Although the steps of processing and presentation are complex, these go a long way towards making the student users life less complex. Through this intricate processing scheme, it is possible to present students high quality, but simple and manageable spectral images which allow them to focus on the most important issue - solving the problems at hand.
References
Figure 1: Flow diagram for WebSpectra spectrum processing.