Rapid advances in technology allow us to observe our environment on a larger scale and in greater detail.
Due to this fact, the popularity of hyperspectral imaging is increasing, especially in agriculture, ecology and environmental monitoring, urban planning, and many more. Applications are limitless, such as detecting concentration of greenhouse gases, monitoring crop irrigation levels to prevent crop waste, or detecting and classifying pollution of water bodies.
If you are not familiar with this topic, see the infographics below to get the basic idea:
Problems limiting Hyperspectral Imaging
In a recent post we highlighted some valuable applications of hyperspectral imaging in agriculture and environmental protection. So how come that such a useful technology is not commonly applied worldwide? There are multiple obstacles preventing that from happening, such as the high cost of satellites and hyperspectral technologies, lack of a common standard for manufacturing of hyperspectral sensors, insufficient labeled data for training, and the high volume of produced data. In this post I’ll focus on the last issue - I will describe what is it all about and what are we doing to tackle it.
The enormous quantities of data produced by hyperspectral sensors become problematic to store, transfer, process, and make sense of. An average hyperspectral image, having close to 100x the size of a regular RGB camera image of the same land area, requires a dedicated approach to handle efficiently.
How to reduce the size of the data?
There are various ways to reduce the size of the data. They can be divided into two main groups:
Data-independent - meaning that hyperspectral images are not the only type of data that can benefit from such “compression”.
Data-dependent - meaning that domain knowledge has to be applied to know what data transformation can be safely applied without much loss of information.
There are many data-independent ways to compress data. One of the simplest is to reduce the representation of numbers, usually from int16 (sixteen bits per number) to int8 (eight bits per number) or even less than that. Using a lower number of bits to represent a number of course lowers its accuracy, but if it is done in a smart way the losses can be negligible. This already saves around 50% of the original space.
There are other, less intuitive approaches, like for example Principal Component Analysis (PCA), which essentially creates new features (called principal components) from the old ones in such a way that the first principal components carry most of the information, the second less, the third even less, etc. Well, they do not carry information in a strict sense, they are ranked using their variance. Some of the last components can be discarded with, hopefully, small loss of information. The application of PCA is wide nowadays; however, there are some issues that one has to keep in mind:
PCA transforms the features into different ones - which usually makes them not easily interpretable,
PCA is irreversible - discarded components cannot be restored, therefore useful information can be lost,
PCA requires access to all initial features, meaning that if data acquisition from a single channel somehow gets disrupted, the PCA transformation has to be recalculated and all previous results might not be comparable with the new ones.
PCA computation is relatively costly and can be difficult to carry out in resource-constrained environments (such as directly onboard a small satellite)
PCA is also a data-agnostic algorithm, which is probably why it is so popular.
Data-dependent approaches are slightly more complex, since they require one to actually having to take a look at the data and try to understand what it represents. A typical hyperspectral image can be thought of as a datacube, with height, width, and depth. Height and width convey spatial information, while depth conveys spectral information. In land area classification we are obviously interested in classifying all the pixels, therefore data reduction should be applied in the spectral dimension.
Reducing the size of a hyperspectral image
Each pixel in a hyperspectral image has a hundred or more channels, where each channel represents how much light intensity of a given wavelength was detected by the sensor. In contrast to an RGB sensor, which captures light in only three broad channels, the channel width of a hyperspectral sensor is much more narrow (see the infographics below), providing a much better spectral resolution. This means that if we know that some channels do not provide much information, we can discard them right away. An example of such channels are those whose wavelengths coincide with the water vapor absorption bands. Water vapor is present in the atmosphere basically all around the globe except at the poles, and it easily absorbs most of the light, especially in some of the near-infrared wavelengths.
NASA is in possession of a well-known hyperspectral sensor called the Airborne Visible/Infrared Imaging Spectrometer, or AVIRIS for short. It takes images in as many as 224 separate channels. An example spectrum of a pixel of hyperspectral image taken by AVIRIS is presented below. The pixel represents a land area covered with soybean crops. There are only 200 channels present, since some were already removed due to the strong water absorption mentioned above. There are more relatively broad and weaker water absorption bands visible, detected in channels close to 60, 80, and 100.
The majority of hyperspectral data analysis encapsulates the hyperspectral image segmentation, meaning assigning a class of each pixel. This class should reflect the content of the land area represented by that pixel. Hyperspectral image segmentation is an important task because it allows to efficiently and accurately map land cover on a large scale. More about the use cases of hyperspectral imaging can be found in my previous blog post or on the infographic.
Looking at the spectra, we realized that adjacent channels have almost the same values, meaning they carry redundant information. We decided to evaluate how dropping every other channel would impact the performance of a machine learning classifier. It turned out that a reduction of 50% of channels, meaning 50% if the size of the data, has a negligible effect on the final accuracy of three popular ML classifiers. We decided to continue with such reduction (actually we exploited three different methods of such data reduction - taking an average of a few adjacent channels, taking the channel with the highest signal to noise ratio, and selecting one channel randomly in a given window. They all gave similar results) for larger factors. The image below shows how the spectra of the same pixel change if we apply 2x, 4x, and 8x reduction of the data, as well as the original spectrum for comparison. We can see that the 2x reduction changes virtually nothing. The 4x reduction has some edges here and there, but generally follows the trend of the original data. Only at 8x reduction we start to see some degradation of the spectrum, something that even smart interpolation cannot revert.
If we push the reduction even further, we obtain an even more significant reduction of size at the cost of a visible loss of spectral resolution. At x16 reduction one can barely see the water absorption minima, but at x32 only the general trend is present.
How to take advantage of Hyperspectral Imaging and use it commercially?
Machine Learning algorithms confirmed our hypothesis that even a massive data reduction (up to 16-32x) does not drastically degrade the performance of hyperspectral images. Also, such reduced sets are much easier and cheaper to store and transfer in a constrained Earth observation scenario, where the available data transfer time slot is usually very short.
We confirmed that for applications such as crop detection and monitoring, and possibly many similar, the spectral resolution of hyperspectral images can be safely reduced. It is important to be aware that there are disciplines and types of data where such a drop of resolution would lead to larger losses in accuracy, however some levels of reduction should always be applicable.
In principle, our approach could be used in conjunction with other band/feature selection and feature extraction algorithms to not only further enhance the data size reduction rates but also improve the classification performance, as well as with other conventional machine-learning and deep learning segmentation techniques. These findings were very valuable to the hyperspectral image processing researchers and the IEEE Geoscience and Remote Sensing Society - we were invited to the IGARSS 2019 conference in Yokohama to present our work and discuss the challenges in autonomous hyperspectral image processing and data mining.