Extracted from Chimot, J., Global mapping of atmospheric composition from space – Retrieving aerosol height and tropospheric NO2 from OMI, PhD book, Delft University of Technology (TU Delft), The Royal Netherlands Meteorological Institute (KNMI), July 2018.

The atmospheric retrieval problem from a spectral satellite measurement generally faces, at least, three fundamental challenges: , 1) the choice of the forward model itself leads to systematic errors, 2) the retrieval problem is ill-posed, 3) required analytical equations of the geophysical retrieval do not exist.

A forward model can be chosen based on a full physical Radiative Transfer Model (RTM) or relatively more simplified such as the Beer-Lambert law as considered in the Differential Optical Absorption Spectroscopy (DOAS).

The measured backscattered solar light contains all the information on the atmospheric composition and the surface. However, this in- formation cannot resolve all the fine structures of the atmosphere. The amount of independent (*i.e.* non correlated) pieces of information, also named degree of free- doms in the frame of Rodgers (2000), is generally low. The atmospheric retrieval problem is then ill-posed, an infinite number of atmospheric states (or state vectors) can lead to a same single measurement. As an example, the atmospheric vertical pro- file of Nitrogen dioxide – NO2 cannot be derived from space-borne sensors such as OMI or TROPOMI, but instead the total (stratosphere+troposphere) or tropospheric column density is estimated. This column represents the integrated number of molecules per cm2 along the considered atmospheric layers (until the tropopause height for tropospheric NO2). Another illustration, largely mentioned in this thesis research, is the complexity to distinguish clouds, aerosols and bright surfaces from a single passive satellite measurement. Successful retrievals from passive sensors are obtained over clear pixels, and therefore accurate cloud filtering is a major requirement. However, in practice, small cloud residuals may persist and lead to systematic biases as discussed in the next chapters.

Such ill-posed problems have two direct consequences: 1) the state vector must be carefully designed such that is consistent with the actual number of pieces with in- dependent information, 2) accurate prior knowledge on the geophysical parameters, contributing to the measured radiance (*i.e.* set of forward model parameters b), is necessary to constrain the retrieval. Retrieval of tropospheric NO2 vertical column density from UV-visible satellite spectra must take into account the surface albedo, cloud and aerosol interferences, and the shape of NO2 vertical profile in the vector b. We will see in the next chapters that their combined uncertainty is one of the most crucial factor affecting the tropospheric NO2 retrieval from OMI-like sensors.

While a radiative transfer model describes the dependency of the measurementy on the state vector x, a retrieval is an inverse model where x depends on y. Such an inverse model typically does not exist by nature. To address this, there are generally speaking two approaches: variational and statistical (Blackwell, 2005). The variational approach employs a full physical forward model on-line, *i.e.* for each single measurement y. An a priori state vector xa is explicitly used, with its asso- ciated prior error, in addition to the set b. It represents our statistically best prior knowledge of the geophysical parameter to be retrieved to be propagated through the forward model, thereby producing a simulation of the at-sensor radiance (Black- well, 2005). Both simulation and measurement are compared and the state vector is iteratively adjusted until the modeled radiance matches the observation. The minimization equation becomes (Blackwell, 2005):

x′ =minx(||Sy−1/2(F(x,b′)−y)||^2+||Sa−1/2(x−xa)||^2),

with Sy the measurement error covariance matrix, and Sa the *a priori* state vector covariance matrix (*i.e.* the best statistical knowledge of the uncertainty related to our prior state vector). The Optimal Estimation Methodology (OEM) is likely the most famous variational approach as defined by Rodgers (2000). It seeks the statistically most likely solution by applying Bayes theorem.

The statistical approach is regression-based and does not explicitly use the for- ward model (that is off-line). Instead the estimation is based on an ensemble of radiance/state vectors Rensemble and statistical characterization [p(x′), p(y), p(y/x′)]: the probabilities p of the event x′ (p(x′)), of the event y (p(y)), and of y under the condition of the event x′ (p(y/x′)). It can be summarized as (Blackwell, 2012):

x′ = minx (||xensemble − F(Rensemble)||)

In practice, such probability density functions are difficult to obtain and alternative methods are employed like linear least-square, linear regression and non-linear least-square (Blackwell, 2005). Look-up Tables (LUT) and machine learning such as the Neural Networks (NN) or Principal Compression Analysis (PCA) are special classes of the statistical approach (Atkinson and Tatnall, 1997; Blackwell, 2005). Contrary to the variational approach, statistical techniques do not fit the state vector iteratively. Combined with an off-line forward model, this generally presents the advantage of a fast processing time.

The pros and cons of variational and statistical approaches are quite open to debate within the scientific community. Methods like the OEM are expected to give a high accuracy for an individual measurement, a retrieval error estimation together with some diagnostics variables (*e.g.* degree of freedom of signal, or vertical averaging kernel). Furthermore, by using prior and posterior error matrices, it can take into account potential correlations between different error sources. However, because of the need to perform some iterations and the use of a full physical model, computing time can be high, especially for a strongly non-linear problem. Furthermore, there is no guarantee that the system will converge. The so-called “divergence” problem is quite important with this approach (Sanders *et al.*, 2015). Although, there are quite many tricks to tackle this issue (*e.g.* the Levenberg-Marquardt approach), no simple solution exists. Finally, the result may be quite dependent on the source of the prior knowledge and how well its uncertainty is characterized. The main advantage of a statistical approach is the fast computing time. Compared to the OEM, the dependence on prior information is lower although still significant: *e.g.* see the sensitivity of neural network aerosol layer height to aerosol model, or the tropospheric NO2 air mass factor (AMF) to the geophysical parameters. Such a system does not encounter by definition divergence problems, which means it always gives a solution for each measurement (which, on the other hand, might be more or less accurate). However, it doesn’t give an optimal solution for an individual measurement but more a statistically good one for an ensemble. No error estimate is directly available for each retrieval. The error correlations in the measurement yor the assumed forward model cannot be easily taken into account. In case of a big LUT, the high memory consumption can become an important issue. Finally, there can be quite open questions on the degree of physical realism when using a machine learning approach. However, this last point actually depends on the generation of the training database and the technique used for the training process (e.g. over-training verification, identification and evaluation of the machine learning architecture etc…).