WebSpectra Technical Notes
Craig A. Merlic, Barry C. Fam and Michael M. Miller
Department of Chemistry and Biochemistry
University of California, Los Angeles, California 90095-1569
Given the digital nature of modern spectroscopic data such as NMR and IR spectra, it would seem obvious to use the WWW to present this information. However, while posting of documents and most graphics on the WWW are routine at this point in time, presentation of high resolution spectral data presents special problems. In the WebSpectra site, spectra are presented in the graphic interchange format (GIF), a common format that is directly supported by WWW browsers, while maintaining spectral precision through a dynamic spectrum magnification feature. The important issue on this latter point is that spectral magnification is principally a one-dimensional function, yet no WWW browser or plug-in supports this type of image manipulation.
Technical Details of Implementation
One of the important points in processing and storage of spectra is the preservation of the high-resolution data. This brings up immediate problems in the first step of processing, data acquisition. All of the proton and carbon NMR spectra on the site were obtained using a Bruker ARX400 spectrometer (1). Also, all processing of these spectra, including Fourier transformation (2), phasing, baseline correction, and integration were performed using Brukers proprietary xwinnmr software (3). At this point, the only way to store the fully processed spectra is in Postscript format (4). Due to a limitation in the Bruker software, the maximum resolution possible in this format is only 72 dots per inch (dpi). This results in unacceptable data loss, so the solution was to expand spectra across multiple "pages" within the Postscript file. By spanning a greater area, spectral accuracy is maintained even at the low resolution per unit area.
Due to the nature of the Postscript format, data files are not readily exportable to many other common graphics formats. Therefore, the next processing step is translation of the Postscript data, using Aladdins Ghostscript software (5), into PCX graphics format (6). This conversion is relatively straightforward, although one PCX file is produced for each of the virtual pages in the Postscript file. At this point, the vertical resolution is reduced to 36 dpi to reduce file size and increase computational speed, since extreme precision is not needed in that direction.
Because the specifications for the PCX format are publicly available (6), it was possible to create a series of utilities that read PCX images and manipulate them freely. By this method, the PCX files generated in the last step were converted to a storage format and merged into one large image containing the entire spectrum. For DEPT spectra, the four individual spectra (Oo, 45o, 90o, 135o) were combined into one image at this point as well. Because care was taken to preserve data resolution, each resulting image file has the dimensions of 12672x306 pixels. It is the format of this final image that is particularly important in the speed and practicality of the WebSpectra site for end users; the efficiency of the Postscript and PCX file formats are of secondary importance since they are only used in data processing.
At this point, the processing of 1D NMR and IR spectra converge (Figure), and that of 2D NMR spectra diverges from those two. The manipulation of 2D NMR spectra will be discussed later. Processing of IR spectral data that brings IR spectra to the final storage format is, in comparison to the NMR spectra, much simpler. Data was acquired on a Nicolet 510P FT-IR (7) and saved as raw XY plot data. IR spectra do not require any of the complex secondary processing, for example integration or phasing, involved in NMR spectra. Therefore, it was possible to write another utility program that generates a plot file of this data directly in the final WebSpectra image storage format.
This final image format, in which both types of one-dimensional spectra are stored on the WWW server, is a non-standard, compressed image format designed specifically for WebSpectra. This "WebSpectra format," designated as the SPC format, was implemented so that access by the CGI program is as fast as possible about 5 times faster than GIF or uncompressed formats. The efficiency of the WebSpectra spectrum format, or the inefficiency of other image formats, is due to the fact that the spectral data files are extremely large due to the high horizontal resolution. Other image formats are usually more suitable for smaller images such as pictures, and have problems dealing with the large spectrum images. By fine-tuning the properties of the SPC image format, it has been possible to minimize access speed to the point where the WWW delivery is seamless.
A problem encountered when using graphic formats such as GIF or PCX to store spectrum data is that, in order to access a particular part of an image, the entire image must be decompressed. In most cases, especially at higher magnifications, this is unnecessary, since only a small portion of that data is actually being displayed. Uncompressed image formats do not have this problem but instead suffer from slow disk access times, since uncompressed spectra can reach more than 4 megabytes in size. The SPC format solves these barriers by storing data in sections, such that it is possible to jump immediately to any specific portion of the image, dramatically decreasing access time since only the necessary data is retrieved. In addition, each section is individually compressed, so the SPC format produces files that are approximately 40 times smaller than the raw data. Thus, with the SPC format it is practical not only to deliver spectra at high resolution, but also to store a large number of such spectra in a relatively small amount of disk space.
Although the spectral images are the most important component of the WebSpectra database, there are several auxiliary files that were also generated during data processing. These files are in plain text and contain necessary information relating to the corresponding spectra such as compound name and formula, solvent(s) used, and for 1D spectra, signal frequencies. The signal frequencies are generated from the Bruker xwinnmr software mentioned in the initial step in NMR processing. These are displayed beside the spectrum and allow calculation of coupling constants as well as location of signals and magnification. Although important, storage of these values is straightforward and much simpler than the spectral images themselves.
Once spectra are stored on the WWW server, the final hurdle is presentation of appropriately formatted images to the user. The presentation of 2D spectra is relatively simple, since there is little dynamic image processing involved. For 1D spectra, the WebSpectra CGI program handles reading of the stored high-resolution spectrum images, processing and creation of a temporary GIF image, and final display on a WWW page. Spectra are read in the WebSpectra image format described earlier. Spectral images are then processed and compressed using the gd GIF library (8). Processing includes calculation and insertion of a ppm calibration line beneath the spectrum, as well as any magnification requested by the user.
The end result of this long series of processing is the pages which make up the WebSpectra site. Although the steps of processing and presentation are complex, these go a long way towards making the student users life less complex. Through this intricate processing scheme, it is possible to present students high quality, but simple and manageable spectral images which allow them to focus on the most important issue - solving the problems at hand.
Figure 1: Flow diagram for WebSpectra spectrum processing.