Introduction and Background
AD Instruments has approached MISG with a project proposal in medical data compression.
Their main problem involves compression of 16-bit floating point data. They are interested
in both lossless and lossy techniques, but it is important to be able to accurately
quantify any loss occurring in lossy schemes. Signal modelling based approaches are also
mentioned as an area of interest.
Their current scheme works on integer data and uses differential run-length encoding.
They claim also that this technique achieves about the same degree of compression as
classic lossless techniques like Huffman encoding. They express some concern about the
effectiveness of such methods on floating point data. They state that dictionary based
schemes like Lempel-Ziv are not effective even on the integer data.
More specifically, the questions of interest to them are :
- Can better lossless compression be achieved by use of data modelling?
- Can the compression be improved by permitting some controlled loss in the compression
scheme?
- Can floating point data be directly compressed without moving to an integer
representation?
The purpose of this document is to illustrate some properties of a data set provided by
AD Instruments, and apply 2 simple non-model based methods to the data.
Data Analysis
Consider figure 1 which shows a data times series which is real valued. The magnitude
of the discrete Fourier transform (DFT) of the data is also shown (positive frequencies
only). It is immediately evident that the data has been oversampled by a factor of 3. Thus
by decimation (the matlab function decimate was used) we can reduce the data
set by a factor of 3, yielding the signal and its DFT as shown in figure 2. The data has
also had the mean subtracted.
Figure 1
Figure 2

The data appears to have 2 superimposed `periodic' components. One appears to have a
line spectrum with fundamental frequency of 0.029 and both second and third harmonics
present. The other appears to have a fundamental at about 0.036, with second and third
harmonics present. There is however substantial broadband energy across most of the
frequency range.
AR Modelling
Autoregressive modelling is an effective technique in speech analysis. Here we apply a
two sided technique where the current data point is modelled as a sum of forward and
backward predictions
Optimal least-squares coefficients can be found via normal equations. We plot in figure
3, the relative prediction error norm for various values of p+ and p-
when the other is set at 10.
Figure 3.
Techniques suggested for modelling include :
- Autoregressive (AR) and Autoregressive-Moving Average (ARMA) models
- Subspace methods such as the Karhunen-Loeve transform
- Wavelet techniques
With such methods we generally also encode the residuals (ie data minus modelled value)
with some coarser resolution than the original data. In addition, we should also focus on
block-based techniques as real time processing does not appear to be a requirement. |