misg.gif (2347 bytes)

 

HOME

Medical data compression

AD Instruments

Moderators: Lang White (University of Adelaide) and Charles Pearce (University of Adelaide)

AD Instruments

Introduction and Background

AD Instruments has approached MISG with a project proposal in medical data compression. Their main problem involves compression of 16-bit floating point data. They are interested in both lossless and lossy techniques, but it is important to be able to accurately quantify any loss occurring in lossy schemes. Signal modelling based approaches are also mentioned as an area of interest.

Their current scheme works on integer data and uses differential run-length encoding. They claim also that this technique achieves about the same degree of compression as classic lossless techniques like Huffman encoding. They express some concern about the effectiveness of such methods on floating point data. They state that dictionary based schemes like Lempel-Ziv are not effective even on the integer data.

More specifically, the questions of interest to them are :

  • Can better lossless compression be achieved by use of data modelling?
  • Can the compression be improved by permitting some controlled loss in the compression scheme?
  • Can floating point data be directly compressed without moving to an integer representation?

The purpose of this document is to illustrate some properties of a data set provided by AD Instruments, and apply 2 simple non-model based methods to the data.

Data Analysis

Consider figure 1 which shows a data times series which is real valued. The magnitude of the discrete Fourier transform (DFT) of the data is also shown (positive frequencies only). It is immediately evident that the data has been oversampled by a factor of 3. Thus by decimation (the matlab function decimate was used) we can reduce the data set by a factor of 3, yielding the signal and its DFT as shown in figure 2. The data has also had the mean subtracted.

Figure 1
Figure 1a Figure 1b

Figure 2
Figure 2a Figure 2b

The data appears to have 2 superimposed `periodic' components. One appears to have a line spectrum with fundamental frequency of 0.029 and both second and third harmonics present. The other appears to have a fundamental at about 0.036, with second and third harmonics present. There is however substantial broadband energy across most of the frequency range.

AR Modelling

Autoregressive modelling is an effective technique in speech analysis. Here we apply a two sided technique where the current data point is modelled as a sum of forward and backward predictions

equation

Optimal least-squares coefficients can be found via normal equations. We plot in figure 3, the relative prediction error norm for various values of p+ and p- when the other is set at 10.

Figure 3.
Figure 3

Techniques suggested for modelling include :

  • Autoregressive (AR) and Autoregressive-Moving Average (ARMA) models
  • Subspace methods such as the Karhunen-Loeve transform
  • Wavelet techniques

With such methods we generally also encode the residuals (ie data minus modelled value) with some coarser resolution than the original data. In addition, we should also focus on block-based techniques as real time processing does not appear to be a requirement.