Compressive spectral imaging (SI) (CSI) acquires few random projections of an SI reducing acquisition, storage, and, in some cases, processing costs. Then, this acquisition framework has been widely used in various tasks, such as target detection, video processing, and fusion. Particularly, compressive spectral image fusion (CSIF) aims at obtaining a high spatial–spectral resolution SI from two sets of compressed measurements: one from a hyperspectral image with a high-spectral low-spatial resolution, and one from a multispectral image with high-spatial low-spectral resolution. Most of the literature approaches include prior information, such as global low rank, smoothness, and sparsity, to solve the resulting ill-posed CSIF inverse problem. More recently, the high self-similarities exhibited by SIs have been successfully used to improve the performance of CSI inverse problems, including a nonlocal low-rank (NLLR) prior. However, to the best of our knowledge, this NLLR prior has not been implemented in the solution of the CSIF inverse problem. Therefore, this article formulates an approach that jointly includes the global low rank, the smoothness, and the NLLR priors to solve the CSIF inverse problem. The global low-rank prior is introduced with the linear mixture model that describes the SI as a linear combination of a set of few end-members to specific abundances. In this article, either the end-members are accurately estimated from the compressed measurements or initialized from a fast reconstruction of the hyperspectral image. Also, it assumes that the abundances preserve the smoothness and NLLR priors of the SI so that the fused image is obtained from the end-members and abundances that result when minimizing a cost function including the sum of two data fidelity terms and two regularizations: the smoothness and the NLLR. Simulations over three data sets show that the proposed approach increases the CSIF performance compared with literature -
pproaches.
The trade-off between feature representation power and spatial localization accuracy is crucial for the dense classification/semantic segmentation of remote sensing images (RSIs). High-level features extracted from the late layers of a neural network are rich in semantic information, yet have blurred spatial details; low-level features extracted from the early layers of a network contain more pixel-level information but are isolated and noisy. It is therefore difficult to bridge the gap between high- and low-level features due to their difference in terms of physical information content and spatial distribution. In this article, we contribute to solve this problem by enhancing the feature representation in two ways. On the one hand, a patch attention module (PAM) is proposed to enhance the embedding of context information based on a patchwise calculation of local attention. On the other hand, an attention embedding module (AEM) is proposed to enrich the semantic information of low-level features by embedding local focus from high-level features. Both proposed modules are lightweight and can be applied to process the extracted features of convolutional neural networks (CNNs). Experiments show that, by integrating the proposed modules into a baseline fully convolutional network (FCN), the resulting local attention network (LANet) greatly improves the performance over the baseline and outperforms other attention-based methods on two RSI data sets.
Developed here is an algorithm for determining the infrared (IR) cloud-top phase for advanced Himawari imager (AHI) measurements from the Japanese geostationary satellite Himawari-8. The tests and decision tree used in the AHI algorithm are different from those in the Moderate Resolution Imaging Spectroradiometer (MODIS) Level-2 cloud product algorithm. Verification of AHI cloud-top phase results with the Cloud–Aerosol Lidar with orthogonal polarization product over a four-month period from March to June of 2017 over the North Pacific gives hit rates of 80.20% (66.33%) and 86.51% (80.61%) for liquid-water and randomly oriented-ice cloud tops, respectively, if clear-sky pixels are excluded (included) from the statistics. Also made are intercomparisons between AHI and MODIS IR cloud-top phase products over the North Pacific in June 2017. AHI liquid-water-phase determinations agree with MODIS liquid-water-phase determinations at an agreement rate of 83.68%, showing a dependence on MODIS zenith angles. The agreement rate of ice phase classifications between AHI and MODIS is 93.54%. The MODIS IR product contains some unreasonable ice-phase pixels over oceans, as well as uncertain-phase pixels over land, and has limitations for daytime liquid-water-phase identifications over the Indo-China Peninsula. Limitations of the AHI cloud-top phase algorithm are mainly caused by difficulties in identifying liquid-water-phase clouds over sun-glint regions and during twilight.
In the last five years, deep learning has been introduced to tackle the hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN)-based methods for HSI classification have made great progress. However, due to the high dimensionality of HSI and equal treatment of all bands, the performance of these methods is hampered by learning features from useless bands for classification. Moreover, for patchwise-based CNN models, equal treatment of spatial information from the pixel-centered neighborhood also hinders the performance of these methods. In this article, we propose an end-to-end residual spectral–spatial attention network (RSSAN) for HSI classification. The RSSAN takes raw 3-D cubes as input data without additional feature engineering. First, a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands. Then, a spatial attention module is designed for the adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or those are useful for classification in the pixel-centered neighborhood and suppressing those from a different class or useless. Second, two attention modules are also used in the following CNN for adaptive feature refinement in spectral–spatial feature learning. Third, a sequential spectral–spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model. Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared with the state of the art on three HSI data sets: Indian Pines (IN), University of Pavia (UP), and Kennedy Space Center (KSC).
Hyperspectral (HS) pansharpening, as a special case of the superresolution (SR) problem, is to obtain a high-resolution (HR) image from the fusion of an HR panchromatic (PAN) image and a low-resolution (LR) HS image. Though HS pansharpening based on deep learning has gained rapid development in recent years, it is still a challenging task because of the following requirements: 1) a unique model with the goal of fusing two images with different dimensions should enhance spatial resolution while preserving spectral information; 2) all the parameters should be adaptively trained without manual adjustment; and 3) a model with good generalization should overcome the sensitivity to different sensor data in reasonable computational complexity. To meet such requirements, we propose a unique HS pansharpening framework based on a 3-D generative adversarial network (HPGAN) in this article. The HPGAN induces the 3-D spectral–spatial generator network to reconstruct the HR HS image from the newly constructed 3-D PAN cube and the LR HS image. It searches for an optimal HR HS image by successive adversarial learning to fool the introduced PAN discriminator network. The loss function is specifically designed to comprehensively consider global constraint, spectral constraint, and spatial constraint. Besides, the proposed 3-D training in the high-frequency domain reduces the sensitivity to different sensor data and extends the generalization of HPGAN. Experimental results on data sets captured by different sensors illustrate that the proposed method can successfully enhance spatial resolution and preserve spectral information.
As one of the most important algorithms in target detection, constrained energy minimization (CEM) has been widely used and developed in recent years. However, it is easy to verify that the target detection result of CEM varies with the data origin, which is apparently unreasonable since the distribution of the target of interest is objective and, therefore, unrelated to the selection of data origin. The clever eye (CE) algorithm tries to solve this problem by adding the data origin as a new variable from the perspective of the filter output energy. However, due to the nonconvexity of the objective function, CE can only obtain locally optimal solutions by using the gradient ascent method. In this article, we find a striking conclusion that there exists an analytical solution for CE that corresponds to the solution of a linear equation and further prove that all the solutions of the linear equation are globally optimal.
Conventional low-rank (LR)-based hyperspectral image (HSI) denoising models generally convert high-dimensional data into 2-D matrices or just treat this type of data as 3-D tensors. However, these pure LR or tensor low-rank (TLR)-based methods lack flexibility for considering different correlation information from different HSI directions, which leads to the loss of comprehensive structure information and inherent spatial–spectral relationship. To overcome these shortcomings, we propose a novel multidirectional LR modeling and spatial–spectral total variation (MLR-SSTV) model for removing HSI mixed noise. By incorporating the weighted nuclear norm, we obtain the weighted sum of weighted nuclear norm minimization (WSWNNM) and the weighted sum of weighted tensor nuclear norm minimization (WSWTNNM) to estimate the more accurate LR tensor, especially, to remove the dead-line noise better. Gaussian noise is further denoised and the local spatial–spectral smoothness is preserved effectively by SSTV regularization. We develop an efficient algorithm for solving the derived optimization based on the alternating direction method of multipliers (ADMM). Extensive experiments on both synthetic data and real data demonstrate the superior performance of the proposed MLR-SSTV model for HSI mixed noise removal.
Class-wise adversarial adaptation networks are investigated for the classification of hyperspectral remote sensing images in this article. By adversarial learning between the feature extractor and the multiple domain discriminators, domain-invariant features are generated. Moreover, a probability-prediction-based maximum mean discrepancy (MMD) method is introduced to the adversarial adaptation network to achieve a superior feature-alignment performance. The class-wise adversarial adaptation in conjunction with the class-wise probability MMD is denoted as the class-wise distribution adaptation (CDA) network. The proposed CDA does not require labeled information in the target domain and can achieve an unsupervised classification of the target image. The experimental results using the Hyperion and Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral data demonstrated its efficiency.
Recently, convolution neural network (CNN)-based hyperspectral image (HSI) classification has enjoyed high popularity due to its appealing performance. However, using 2-D or 3-D convolution in a standalone mode may be suboptimal in real applications. On the one hand, the 2-D convolution overlooks the spectral information in extracting feature maps. On the other hand, the 3-D convolution suffers from heavy computation in practice and seems to perform poorly in scenarios having analogous textures along with consecutive spectral bands. To solve these problems, we propose a mixed CNN with covariance pooling for HSI classification. Specifically, our network architecture starts with spectral–spatial 3-D convolutions that followed by a spatial 2-D convolution. Through this mixture operation, we fuse the feature maps generated by 3-D convolutions along the spectral bands for providing complementary information and reducing the dimension of channels. In addition, the covariance pooling technique is adopted to fully extract the second-order information from spectral–spatial feature maps. Motivated by the channel-wise attention mechanism, we further propose two principal component analysis (PCA)-involved strategies, channel-wise shift and channel-wise weighting, to highlight the importance of different spectral bands and recalibrate channel-wise feature response, which can effectively improve the classification accuracy and stability, especially in the case of limited sample size. To verify the effectiveness of the proposed model, we conduct classification experiments on three well-known HSI data sets, Indian Pines, University of Pavia, and Salinas Scene. The experimental results show that our proposal, although with less parameters, achieves better accuracy than other state-of-the-art methods.
Blind hyperspectral unmixing is the process of expressing the measured spectrum of a pixel as a combination of a set of spectral signatures called endmembers and simultaneously determining their fractional abundances in the pixel. Most unmixing methods are strictly spectral and do not exploit the spatial structure of hyperspectral images (HSIs). In this article, we present a new spectral–spatial linear mixture model and an associated estimation method based on a convolutional neural network autoencoder unmixing (CNNAEU). The CNNAEU technique exploits the spatial and the spectral structure of HSIs both for endmember and abundance map estimation. As it works directly with patches of HSIs and does not use any pooling or upsampling layers, the spatial structure is preserved throughout and abundance maps are obtained as feature maps of a hidden convolutional layer. We compared the CNNAEU method to four conventional and three deep learning state-of-the-art unmixing methods using four real HSIs. Experimental results show that the proposed CNNAEU technique performs particularly well and consistently when it comes to endmembers’ extraction and outperforms all the comparison methods.
The fusion of hyperspectral (HS) and multispectral (MS) images designed to obtain high-resolution HS (HRHS) images is a very challenging work. A series of solutions has been proposed in recent years. However, the similarity in the structure of the HS image has not been fully used. In this article, we present a novel HS and MS image-fusion method based on nonlocal low-rank tensor approximation and sparse representation. Specifically, the HS image and the MS image are considered the spatially and spectrally degraded versions of the HRHS image, respectively. Then, the nonlocal low-rank constraint term is adopted in order to form the nonlocal similarity and the spatial–spectral correlation. Meanwhile, we add the sparse constraint term to describe the sparsity of abundance. Thus, the proposed fusion model is established and its optimization is solved by alternative direction method of multipliers (ADMM). The experimental results on three synthetic data sets and one real data set show the advantages of the proposed method over several state-of-the-art competitors.
Blind hyperspectral unmixing (BHU) is an important technology to decompose the mixed hyperspectral image (HSI), which is actually an ill-posed problem. The ill-posedness of the BHU is deteriorated by nonlinearity, endmember variability (EV) and abnormal points, which are considered as three challenging intractable interferences currently. To sidestep the challenges, we present a novel unmixing model, where a latent multidiscriminative subspace is explored and the inherent self-expressiveness property is employed. The most existing unmixing approaches directly decompose the HSI utilizing original features in an interference corrupted single subspace, unlike them, our model seeks the underlying intrinsic representation and simultaneously reconstructs HSI based on the learned latent subspace. With the help of both clustering homogeneity and intrinsic features selection, structural differences in the HSI and the spectral property of a certain material are exploited perfectly, and an ideal multiheterogeneous subspace is recovered from the heavily contaminated original HSI. Based on the multiheterogeneous subspace, the reconstructed differentiated transition matrix is split into two matrices to avoid the emergence of the artificial endmember. Experiments are conducted on synthetic and four representative real HSI sets, and all the experimental results demonstrate the validity and superiority of our proposed method.
The use of hyperspectral (HS) data is growing over the years, thanks to the very high spectral resolution. However, HS data are still characterized by a spatial resolution that is too low for several applications, thus motivating the design of fusion techniques aimed to sharpen HS images with high spatial resolution data. To reach a significant resolution enhancement, high-resolution images should be acquired by different satellite platforms. In this article, we highlight the pros and cons of employing real multiplatform data, using the EO-1 satellite as an exemplary case. The spatial resolution of the HS data collected by the Hyperion sensor is improved by exploiting both the ALI panchromatic image collected from the same platform and acquisitions from the WorldView-3 and the QuickBird satellites. Furthermore, we tackle the problem of assessing the final quality of the fused product at the nominal resolution, which presents further difficulties in this general environment. Useful indications for the design of an effective sharpening method in this case are finally outlined.
In hyperspectral image (HSI) classification, spatial context has demonstrated its significance in achieving promising performance. However, conventional spatial context-based methods simply assume that spatially neighboring pixels should correspond to the same land-cover class, so they often fail to correctly discover the contextual relations among pixels in complex situations, and thus leading to imperfect classification results on some irregular or inhomogeneous regions such as class boundaries. To address this deficiency, we develop a new HSI classification method based on the recently proposed graph convolutional network (GCN), as it can flexibly encode the relations among arbitrarily structured non-Euclidean data. Different from traditional GCN, there are two novel strategies adopted by our method to further exploit the contextual relations for accurate HSI classification. First, since the receptive field of traditional GCN is often limited to fairly small neighborhood, we proposed to capture long-range contextual relations in HSI by performing successive graph convolutions on a learned region-induced graph which is transformed from the original 2-D image grids. Second, we refine the graph edge weight and the connective relationships among image regions simultaneously by learning the improved similarity measurement and the “edge filter,” so that the graph can be gradually refined to adapt to the representations generated by each graph convolutional layer. Such updated graph will in turn result in faithful region representations, and vice versa. The experiments carried out on four real-world benchmark data sets demonstrate the effectiveness of the proposed method.
Sparse representation-based graph embedding methods have been successfully applied to dimensionality reduction (DR) in recent years. However, these approaches usually become problematic in the presence of the hyperspectral image (HSI) that contains complex nonlinear manifold structure. Inspired by recent progress in manifold learning and hypergraph framework, a novel DR method named local constraint-based sparse manifold hypergraph learning (LC-SMHL) algorithm is proposed to discover the manifold-based sparse structure and the multivariate discriminant sparse relationship of HSI, simultaneously. The proposed method first designs a new sparse representation (SR) model named local constrained sparse manifold coding (LCSMC) by fusing local constraint and manifold reconstruction. Then, two manifold-based sparse hypergraphs are constructed with sparse coefficients and label information. Based on these hypergraphs, LC-SMHL learns an optimal projection for mapping data into low-dimensional space in which embedding features not only discover the manifold structure and sparse relationship of original data but also possess strong discriminant power for HSI classification. Experimental results on three real HSI data sets demonstrate that the proposed LC-SMHL method achieves better performance in comparison with some state-of-the-art DR methods.
Sparse unmixing, as a semisupervised unmixing method, has attracted extensive attention. The process of sparse unmixing involves treating the mixed pixels of hyperspectral imagery as a linear combination of a small number of spectral signatures (endmembers) in a standard spectral library, associated with fractional abundances. Over the past ten years, to achieve a better performance, sparse unmixing algorithms have begun to focus on the spatial information of hyperspectral images. However, less accurate spatial information greatly limits the performance of the spatial-regularization-based sparse unmixing algorithms. In this article, to overcome this limitation and obtain more reliable spatial information, a novel sparse unmixing algorithm named superpixel-based reweighted low-rank and total variation (SUSRLR-TV) is proposed to enhance the performance of the traditional spatial-regularization-based sparse unmixing approaches. In the proposed approach, superpixel segmentation is adopted to consider both the spatial proximity and the spectral similarity. In addition, a low-rank constraint is enforced on the objective function as pixels within each superpixel have the same endmembers and similar abundance values, and they naturally satisfy the low-rank constraint. Differing from the traditional nuclear norm, a reweighted nuclear norm is used to achieve a more efficient and accurate low-rank constraint. Meanwhile, low-rank consideration is also used to enhance the spatial continuity and suppress the effects of random noise. Furthermore, TV regularization is introduced to promote the smoothness of the abundance maps. Experiments on three simulated data sets, as well as a well-known real hyperspectral imagery data set, confirm the superior performance of the proposed method in both the qualitative assessment and the quantitative evaluation, compared with the state-of-the-art sparse unmixing methods.
We propose a novel graph Laplacian-guided coupled tensor decomposition (gLGCTD) model for fusion of hyperspectral image (HSI) and multispectral image (MSI) for spatial and spectral resolution enhancements. The coupled Tucker decomposition is employed to capture the global interdependencies across the different modes to fully exploit the intrinsic global spatial–spectral information. To preserve local characteristics, the complementary submanifold structures embedded in high-resolution (HR)-HSI are encoded by the graph Laplacian regularizations. The global spatial–spectral information captured by the coupled Tucker decomposition and the local submanifold structures are incorporated into a unified framework. The gLGCTD fusion framework is solved by a hybrid framework between the proximal alternating optimization (PAO) and the alternating direction method of multipliers (ADMM). Experimental results on both synthetic and real data sets demonstrate that the gLGCTD fusion method is superior to state-of-the-art fusion methods with a more accurate reconstruction of the HR-HSI.
Remote sensing opens opportunities to assess spatial patterns on ecological data for a wide range of ecosystems. This information can be used to more effectively design sampling strategies for fieldwork, either to capture the maximum spatial dependence related to the ecological data or to completely avoid it. The sampling design and the autocorrelation observed in the field will determine whether there is a need to use a spatial model to predict ecological data accurately. In this article, we show the effects of different sampling designs on predictions of a plant trait, as an example of an ecological variable, using a set of simulated hyperspectral data with an increasing range of spatial autocorrelation. Our findings show that when the sample is designed to estimate population parameters such as mean and variance, a random design is appropriate even where there is strong spatial autocorrelation. However, in remote sensing applications, the aim is usually to predict characteristics of unsampled locations using spectral information. In this case, regular sampling is a more appropriated design. Sampling based on close pairs of points and clustered over a regular design may improve the accuracy of the training model, but this design generalizes poorly. The use of spatially explicit models improves the prediction accuracy significantly in landscapes with strong spatial dependence. However, such models have low generalization capacities to extrapolate to other landscapes with different spatial patterns. When the combination of design and size results in sample distances similar to the range of the spatial dependence in the field, it increases predictions uncertainty.
In this article, we focus on tackling the problem of weakly supervised object detection from high spatial resolution remote sensing images, which aims to learn detectors with only image-level annotations, i.e., without object location information during the training stage. Although promising results have been achieved, most approaches often fail to provide high-quality initial samples and thus are difficult to obtain optimal object detectors. To address this challenge, a dynamic curriculum learning strategy is proposed to progressively learn the object detectors by feeding training images with increasing difficulty that matches current detection ability. To this end, an entropy-based criterion is firstly designed to evaluate the difficulty for localizing objects in images. Then, an initial curriculum that ranks training images in ascending order of difficulty is generated, in which easy images are selected to provide reliable instances for learning object detectors. With the gained stronger detection ability, the subsequent order in the curriculum for retraining detectors is accordingly adjusted by promoting difficult images as easy ones. In such way, the detectors can be well prepared by training on easy images for learning from more difficult ones and thus gradually improve their detection ability more effectively. Moreover, an effective instance-aware focal loss function for detector learning is developed to alleviate the influence of positive instances of bad quality and meanwhile enhance the discriminative information of class-specific hard negative instances. Comprehensive experiments and comparisons with state-of-the-art methods on two publicly available data sets demonstrate the superiority of our proposed method.
Currently, reliable and accurate ship detection in optical remote sensing images is still challenging. Even the state-of-the-art convolutional neural network (CNN)-based methods cannot obtain very satisfactory results. To more accurately locate the ships in diverse orientations, some recent methods conduct the detection via the rotated bounding box. However, it further increases the difficulty of detection because an additional variable of ship orientation must be accurately predicted in the algorithm. In this article, a novel CNN-based ship-detection method is proposed by overcoming some common deficiencies of current CNN-based methods in ship detection. Specifically, to generate rotated region proposals, current methods have to predefine multioriented anchors and predict all unknown variables together in one regression process, limiting the quality of overall prediction. By contrast, we are able to predict the orientation and other variables independently, and yet more effectively, with a novel dual-branch regression network, based on the observation that the ship targets are nearly rotation-invariant in remote sensing images. Next, a shape-adaptive pooling method is proposed to overcome the limitation of a typical regular region of interest (ROI) pooling in extracting the features of the ships with various aspect ratios. Furthermore, we propose to incorporate multilevel features via the spatially variant adaptive pooling. This novel approach, called multilevel adaptive pooling, leads to a compact feature representation more qualified for the simultaneous ship classification and localization. Finally, a detailed ablation study performed on the proposed approaches is provided, along with some useful insights. Experimental results demonstrate the great superiority of the proposed method in ship detection.