Inverse Problem Approach for the Alignment of Electron Tomographic Series.

— Approche proble`me inverse pour l’alignement de se´ries en tomographie e´lectronique — Dans le domaine du rafﬁnage, les mesures morphologiques de particules sont devenues indispensables pour caracte´riser les supports de catalyseurs. A` travers ces parame`tres, on peut remonter aux spe´ciﬁcite´s physico-chimiques des mate´riaux e´tudie´s. Une des techniques d’acquisition utilise´es est la tomographie e´lectronique (ou nanotomographie). Des volumes 3D sont reconstruits a` partir de se´ries de projections sous diffe´rents angles obtenues par Microscopie E´lectronique en Transmission (MET). Cette technique permet d’obtenir une re´elle information tridimensionnelle a` l’e´chelle du nanome`tre. Un proble`me majeur dans ce contexte est le mauvais alignement des projections qui contribuent a` la reconstruction. Abstract — Inverse Problem Approach for the Alignment of Electron Tomographic Series — In the reﬁning industry, morphological measurements of particles have become an essential part in the characterization catalyst supports. Through these parameters, one can infer the speciﬁc physico-chemical properties of the studied materials. One of the main acquisition techniques is electron tomography (or nanotomography). 3D volumes are reconstructed from sets of projections from different angles made by a Transmission Electron Microscope (TEM). This technique provides a real three-dimensional information at the nanometric scale. A major issue in this method is the misalign-ment of the projections that contributes to the reconstruction. The current alignment techniques usu-ally employ ﬁducial markers such as gold particles for a correct alignment of the images. When the use of markers is not possible, the correlation between adjacent projections is used to align them. However, this method sometimes fails. In this paper, we propose a new method based on the inverse problem approach where a certain criterion is minimized using a variant of the Nelder and Mead simplex algorithm. The proposed approach is composed of two steps. The ﬁrst step consists of an initial alignment process, which relies on the minimization of a cost function based on robust statistics mea-suring the similarity of a projection to its previous projections in the series. It reduces strong shifts resulting from the acquisition between successive projections. In the second step, the pre-registered projections are used to initialize an iterative alignment-reﬁnement process which alternates between (i) volume reconstructions and (ii) registrations of measured projections onto simulated projections computed from the volume reconstructed in (i). At the end of this process, we have a correct reconstruction of the volume, the projections being correctly aligned. Our method is tested on simulated data and shown to estimate accurately the translation, rotation and scale of arbitrary transforms. We have successfully tested our method with real projections of different catalyst supports.


INTRODUCTION
The term "tomography" refers to all methods of exact reconstruction or -most often -approached reconstruction of the interior of an object from its projections; in other words methods for obtaining information on the composition of an object from the measurements taken outside the object.
Electron tomography (EM) [1] is a very powerful characterization technique for the reconstruction of the 3D nanoscale structure of objects from a series of two-dimensional projections. A series of 2D TEM projections is acquired by tilting the specimen at various angles (usually in the range ± 70°, one projection every degree) around an axis perpendicular to the electron beam (Fig. 1).
The geometry of acquisition is parallel (i.e. the electron beam which crosses the sample is rectilinear). In such parallel configurations, simplest reconstruction techniques recast the volume reconstruction into a series of independent 2D reconstructions, each corresponding to a slice perpendicular to the detector. Three main approaches have been developed in tomography: Filtered Back-Projections (FBP) [2], algebraic reconstruction methods [3-5] and algorithms based on Fourier transform [6,7].
These reconstruction methods require a precise alignment of the different projections. Because of mechanical imprecision and magnetic lenses defocus, neighboring projections may differ by a shift, a slight tilt and a change in magnification [8]. Currently, the most common alignment technique uses markers' tracking [9][10][11]. This method uses gold nanoparticles spread onto the surface of the specimen prior to imaging; these particles can be localized very accurately, even at high tilt angles, thanks to their round shape and their sharp contrast. Alignment with markers has two advantages. First, since markers positions measured over the full range of tilt angles are fit to a single set of projection equations, the alignment of the series of projections is guaranteed to be globally consistent. Secondly, the method can be adapted to correct anisotropic and non-uniform changes of the specimen during the tilt series [8]. However, the fiducial markers method has several practical disadvantages. It can be difficult to obtain an appropriate distribution of markers on the specimen, i.e. a distribution as homogeneous as possible, a necessary condition for proper alignment. For high-resolution reconstruction (e.g. reconstructed volume with voxels <1 nm 3 ), the size of gold nanoparticles (approximately 5 nm in diameter) becomes considerable and troublesome by masking an important part into the body of the reconstruction. Another disadvantage of markers is the need to track their positions accurately, which can be a very costly step. Certain approaches [12][13][14] are based on points of interest automatically extracted from the images, these points are then used to find the alignment parameters.
We deal here with the case where no such markers are used for alignment. This case can be handled by crosscorrelation methods [15][16][17][18]. The principle of these techniques is based on the alignment between two images. Precisely, the first image of a series of projections is chosen as the reference image, then each image is aligned with the previous image in the series, thereby sequentially compensating image shifts throughout the entire series. Moreover, the accumulation of errors in the estimated parameters is a disadvantage of these methods based on pairwise alignment of images. To overcome this defect, a 3D model-based method is proposed by Dengler [19], in which the alignment parameters were refined by alternating a reconstruction step and an alignment step between the modeled projections and real projections. This method has been developed by several authors [20][21][22]. In cryoEM for the biological sciences, the 3D model-based method is known as a projection matching [23,24], which also yields excellent results for X-ray tomography [25].
In this paper, we propose a new method for the alignment of TEM image series without the need for fiducial markers, which recovers in an accurate manner the changes in translation, rotation and scaling parameters. The alignment procedure consists of two stages: -first, we use an optimization approach to align the series of projections. The aim is to reduce the strong shifts, resulting from the acquisition between successive projections, and to facilitate the following step; -the pre-aligned projections are then used to initialize an iterative procedure which alternatively restores the 3-D object and accurately aligns the projections.
1 INITIAL ALIGNMENT

Alignment Between Two Images
Prior to the general case (global alignment of M images), one needs to build an alignment method for 2 images.
Four transformation parameters are required: horizontal and vertical translations, rotation and scaling. These parameters define how an image I t to be registered is transformed into a reference image I r . We propose a method that is more robust than cross-correlation based approaches (see Appendix 2). This method finds the parameters of the geometrical transformations by minimizing the Mean Squared Error (MSE) between I r and I t : with: The u i are the pixels' coordinates, R Á/ is a linear operator which interpolates its argument in order to apply a geometrical transformation of the image specified by the parameters Á/ ¼ ðÁu x ; Áu y ; Áu; ÁsÞ corresponding respectively to the horizontal and vertical shifts of a translation, the angle of a rotation and the magnification scale (Sect. 1.2). N is the number of pixels in the domain of interest (see Fig. 2), which depends on Á/. In our application, Áu x 2 ½ÀW =2; W =2 and Áu y 2 ½ÀH=2; H=2 with W, H the width and height of the image. The Nelder and Mead simplex algorithm [26], implemented as described in [27], is used to solve (1). This derivative-free optimization method evaluates iteratively EðÁ/Þ until a minimum is found. By simply changing the definition of EðÁ/Þ, our approach can be readily adapted to a large class of similar problems. For instance, we have modified our method to use the SSIM  (Structural SIMilarity) index [28] for finding the best parameters. We however found that the resulting algorithm is unstable for some pairs of images; for that reason, we advocate to use the MSE criterion given in Equation (2). Nevertheless, as shown by Figure 6, in section 1.5.3, minimizing the MSE turns out to also yield a better SSIM than conventional registration methods.
The initialization step deserves some explanations. The shifts Áu x ; Áu y are pre-estimated by means of cross-correlation. The rotation and scale parameters between two successive TEM images are very small: Áu does not exceed AE2 and Ás is in the range AE0:5%; therefore, we start the optimization with Áu ½0 ¼ 0, Ás ½0 ¼ 0 (neither rotation nor scaling change).
Note that, since R Á/ is an interpolation operator which continuously depends on the parameters Á/, minimizing EðÁ/Þ achieves sub-pixel accuracy for the shift parameters.

Image Transformation
The linear operator R Á/ in Equation (2) corresponds to the change of coordinates given by the matrix: ð3Þ which depends on Á/ ¼ ðÁu x ; Áu y ; Áu; ÁsÞ. The relation v ¼ M Á/ Á u changes the coordinates u ¼ ðu x ; u y ; 1Þ T in the initial image into the coordinates v ¼ ðv x ; v y ; 1Þ T in the transformed image. The transform R Á/ is an interpolation operator. On the basis of many experiments, we have found that cubic B-spline interpolation [29] gives better results than nearest neighbors or linear interpolation methods.

Alignment of a Series of Projections
We now turn to the case of a series of projections. The first image is chosen as the reference image. By applying our alignment method for two images, each image is aligned with the previous image in the series, thereby sequentially compensating image shifts throughout the entire series. This method minimizes the following cost function: where M is the number of images in the sequence, N t is the number of pixels in the domain of interest between I t and I tþ1 , / is a set of transformation parameters's vectors: The form of function (4) allows parameters Á/ t ¼ ðÁu t;x ; Áu t;y ; Áu t ; Ás t Þ associated with each pair of images to be determined in parallel. Figure 3 shows the relation between the transformation parameters in a series of images. c t ¼ u t;x ; u t;y ; u t ; s t À Á ; t ¼ 1; . . . ; M À 1 are the pseudo-transformation parameters between images 2; . . . ; M and the first (reference) image. The components of c t are:

Evaluation of the Alignment Accuracy
The SSIM index [28] measures the similarity between two images; we use it in order to check the efficiency of our method. The SSIM score, between À1 and 1, achieves its maximum value SSIM = 1 if and only if both images are identical. In our application, the SSIM index gives a degree of similarity in the domain of interest between the reference image and the registered image.
Correspondence between the transformation parameters of the image series.

Experimental Results
For the following results, we use a series of TEM projections of a standard zeolite powder (CBV712 from Zeolyst). Our algorithm has been implemented and tested with Yorick (http://yorick.sourceforge.net/) on 2.6-GHz Intel Core 2 Duo machine. The computation time required for registration an image pair varies depending on the image size and the richness of texture content in the images.

Case Without Noise
In a first test, a 256 Â 256 reference image I r of 8-bit grayscale (Fig. 4a) is transformed into a new image I t by applying a transformation R Á/ to I r with Á/ ¼ ðÀ5; 5; À10 ; À3%Þ. The corresponding SSIM ðI r ; I t Þ is 0.85. I a is the image after registration of I t on I r using the proposed method. The residual image between I a and I r is shown in Figure 4b, with SSIM ðI r ; I a Þ = 0.99. We return to the case of TEM images: Áu and Ás are very small. We generate a series of 140 random transformations, each consisting of a variation: Áu x , Áu y 2 ½À30; 30, Áu 2 ½À0:5; 0:5, Ás 2 ½À0:01; 0:01. We apply this transformation to I r to create a series of test images. We then attempt to register each test image to I r . The accuracy of the estimated translations (Áu Ã t;x , Áu Ã t;y ) is given by computing the mean shift error: for the estimated rotation (Áu Ã t ) and scale (Ás Ã t ): M is the number of images. We show these values in Table 1.

Noisy Case
In this test, image I t (Fig. 5a) is created by applying a transformation R Á/ to I r (Fig. 4a), with Á/ ¼ ðÀ5; 5; À10 ; À3%Þ and adding Gaussian noise with zero mean and a standard deviation of r 5:0 pixels. The corresponding SSIM ðI r ; I t Þ is 0.83. Once I r and I t have been aligned, the residual image is very satisfactory ( Fig. 4b, with SSIM ðI r ; I a Þ = 0.97). We now add Gaussian noise (r 5:0 pixels) to each image of the series of test images, which is used in Section 1.5.1. The acquired images are then registered to I r . We show the accuracy of the estimated translations, rotations and scales in Table 1.

Alignment of a Series of Projection
We have registered a series of TEM projections of size 256 Â 256 of a zeolite catalyst support with our registration method (Sect. 1.3). The series contains 142 projections; the angle of tilt h varies from À71 to þ70 with a þ1 increment. The projection which corresponds to h ¼ À71 , is presented in Figure 4a. We have compared the presented method to a robust standard method [18] which sequentially performs translation, rotation and scale registration. The cubic spline method [29] has been used for all interpolation procedures.
In Figure 6, higher score is better, the symbols (Ã) represent SSIM factors for the alignment of each pair of images using cross-correlation based approach, while the symbols () and the solid line (-) represent the a) b) Figure 4 a) Reference image of zeolite catalyst support I r , b) difference between reference image I r and aligned image I a . Gaussian noise (r 5:0) 0.262 1:9 Â 10 À2 3:4 Â 10 À3 SSIM factors for our method based on minimizing MSE or maximizing SSIM respectively. The presented registration method gives a higher similarity between the reference image and the registered image than the robust standard sequential approach. We show in Figure 7 the MSE factors corresponding to the alignment of each pair of images by different methods, it shows clearly that the proposed methods have less errors with respect to the standard method.
We conclude from Figures 6 and 7 that finding the best parameters by minimizing MSE or maximizing SSIM are essentially equivalent, while the accuracy obtained by a standard robust sequential method is lower.
For each pair of images, our method converges in 70 iterations on average, with total time $6.5 s, while the other method needs 4 s.

JOINT RECONSTRUCTION AND REFINED REGISTRATION
Typically, the tomography problem is represented by the relationship between the observed image (projections measurements) and the object to be reconstructed, which can be represented by the model: where I t 2 R m corresponds to tomographic projections, which is observed on the detector (for the t th projection), x 2 R n are the so-called voxels describing the object, H / t 2 R mÂn is a linear projection operator that characterizes how the projections are obtained from the object; the / t 2 R 6 are orientation and position parameters of  Values of the SSIM index for registrations obtained by standard robust sequential method (green Ã) [18] and by the proposed method based on: minimizing MSE (red ), or maximizing SSIM (blue -), evidencing a systematic higher scoring for our method. a) Image to be registered, b) residual image between I r and I a .
the object with respect to the instrument (source þ detector) for the acquisition of the t th projection (see Appendix 1). In Equation (7), the term e t 2 R m represents the errors due to the measurement noise and to the approximations of the model.

Solution of the Inverse Problem
The solution of the inverse problem is obtained by minimizing a cost function with respect to all voxels x and to all orientation parameters # ¼ f/ t g M t¼1 , where M is the number of projections: For statistically independent measures, the cost function is given by: with f t the likelihood term of the t th projection. For example, for a Gaussian noise: where the weight matrix is the inverse of the covariance matrix of the noise: W t ¼ Cov ðe t Þ À1 . The function f prior ðxÞ strengthens the priori on the voxels x; the functions c t ð/ t Þ introduce knowledge (measured or a priori) on the orientation parameters. A direct resolution of the problem as given by Equation (8) is impractical because it depends on many heterogeneous parameters (voxels, translations and angles). Moreover, the cost function is multimodal. In principle, a global optimization method is necessary. We therefore split this difficult problem into sub-problems easier to solve and for which we have effective methods of resolution.

Hierarchical Optimization
For given positional parameters #, finding the best voxels amounts to a reconstruction formally given by: By plugging this solution into the cost function, we obtain a criterion depending only on #: The best positioning parameters are then obtained by solving an optimization problem of smaller size: The solution of the global problem is then given by: The reconstruction step, given by Equation (11), can be performed by an existing algorithm such as Filtered BackProjection (FBP) or by an algebraic reconstruction method from the pre-aligned projections ( Sect. 1.3). However, the hierarchical optimization method is computationally too expensive to be applied directly. To accelerate the process, we use an alternating optimization approach (Sect. 2.3) which can however be sub-optimal compared to a hierarchical optimization.

Alternating Optimization
This method alternately estimates the voxels x for given positioning parameters # and then estimates the parameters # for given voxels x. This amounts to alternately perform volume reconstruction, Equation (11), and registration. As the voxels are considered fixed during the registration stage, each image can be registered independently (in parallel). In addition (see Appendix 1), the alignment of a projection can be done in a rather fast way by a re-interpolation of the projection model. 1. Initialization. Choose initial orientation parameters # ½0 and let k ¼ 0.

2.
Reconstruction. Estimate the voxels given the positioning parameters # ½k : 3. Alignment. For each projection, seek the best positioning parameters, with fixed voxels x ½kþ1 : For a separable cost function f like the one in Equation (9), the parameters / t associated with each projection are determined independently (that is, in parallel): Results are aggregated into: 4. Convergence test. If the method has converged (e.g., the maximum magnitude of the translation alignment parameters is less than 1.0 pixels for two consecutive iterations), stop the iterations, otherwise increment k and return to step 2.

Results
In this section, we describe experimental results on the testing of our method using two data sets: a synthetic generated data and some series of TEM projections of standard zeolite powder. The computation time required for alternating optimization process depending separately on the time needed by the registration and the time spent for a reconstruction.

Algorithm Testing with Simulation Data
We suppose that we want to reconstruct one nanoparticle that has a single composition, embedded in a homogeneous support. Figure 8a shows a cross-section of the sample, orthogonal to the rotation axis of the tilt stage. The series contains 142 simulated projections (256 Â 256 pixels 2 ) are computed from À71 to þ70 , using angular steps of þ1 . For simulation the misaligned images, due to the displacements of the sample, each image in the series is randomly transformed: horizontal and vertical shifts amount of at most AE30% of image dimensions, slightly rotations (does not exceed AE0:5 ) and small magnification changes (in the range AE1:0%). To make the simulation more realistic, Gaussian noise with zero mean and a standard deviation of r 5:0 pixels is added to each of the projection images. The coarse alignment process (Sect. 1.5.3) is applied on the simulation projection images. The aim is to reduce the strong shifts. From these pre-aligned projections, a first reconstructed volume is obtained by minimizing Equation (15) with a Quasi-Newton optimization algorithm: the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method, combined with total-variation regularization [30]. We use a numerical model of projection (projector) based on cubic B-splines [31], which provides much less approximation errors than the distance driven projector [32]. Figure 8b shows a cross-section of the reconstructed volume corresponding to the cross-section shown in Figure 8a. The shape of the reconstructed particle is clearly distorted, due to the accumulation of errors in the coarse alignment process. This reconstructed volume is used to calculate the simulated views, which are then considered as reference images, that are matched with each initial projection. This process is repeated, the quality of the reconstruction has improved considerably in few iterations. Figure 8c shows the cross-section of the reconstruction using 2 iterations of refined registration process. It is already very clear that the quality of the reconstruction has improved considerably. Only 6 iterations are necessary to archive a good reconstruction (Fig. 8d), which is nearly perfect with respect to the original phantom. The quality improvement of the reconstructed volume at different iterations of alignment process is shown in Figure 9.
In the current implementation of the code, the time spent for a reconstruction by optimization using L-BFGS, is about 3 hours for a volume of 256 Â 256 Â 256 voxels. This time can be reduced considerably by performing the reconstruction step by a standard reconstruction method as FBP, SIRT, ART, etc.; however, these methods can not perform regularized reconstruction, which is necessary to reduce artifacts due to missing projections (limited angle geometry).

Algorithm Testing with Experimental Data
In order to better understand the porosity of the zeolite powder (CBV712 from Zeolyst), we used electron tomography to evaluate the full 3D structure of the material. The tilt-series for the tomographic reconstruction was acquired on a TEM JEOL 2100F. The first series of projections contains 142 images of a zeolite powder, which were acquired semi-automatically over a tilt range varying from À71 to þ70 . The projection which corresponds to h ¼ À71 , is shown in Figure 4a. The series of images were aligned using our proposed approach.
We show in Figure 10a,b the isosurfaces of the final reconstruction, which is obtained after 6 iterations of the joint reconstruction and refined registration process. Figure 10c shows a cross-section in the yz direction of the final reconstruction which shows that even small details are reconstructed accurately.
Our second test used an other series of a zeolite powder, which contains 141 images from À71 to þ69 . The projections recorded at h ¼ À71 ; 20 are shown in Figure 11a,b. We apply the same process of alignment. Figure 11c,d illustrate the isosurfaces of the final reconstruction obtained after 6 steps of iterations of registration process.

CONCLUSION
An automatic robust registration method using an inverse problem approach has been presented. Our experimental results demonstrate that the proposed method yields accurate translation, rotation and scaling parameters for electron tomographic series without needing fiducial markers. Values of the SSIM index and MSE between the original phantom and the reconstructed volume at different iterations. a) θ = −71°b) θ = 20°c ) d) Figure 11 a,b) The real projections. c,d) The isosurfaces of the final reconstruction obtained after 6 steps of iterations of the joint reconstruction and refined registration process.
We are now working on a strategy to cope with missing projections by taking into account priors such as having a piecewise constant object with a very limited number of phases. This strategy will be integrated in the alternating optimization process to improve the quality of the reconstructed object in spite of instrumental jitter.

APPENDIX 1 Alignment of a Projection
For a given projection, there are 6 orientation parameters / t 2 R 6 : three translational ones: X t ; Y t ; Z t and three rotational ones corresponding to three Euler angles b t ; w t ; a t . Since the considered system performs parallel projections, by an adequate choice of the axes (two axes OX ; OY aligned with the pixels of the detector and the third one, OZ in the normal direction), 2 terms of translation X t ; Y t correspond to a translation of the projection; the third one, Z t , has an impact on the magnification of the projection; the angle of rotation a t (around OZ) corresponds to a simple rotation of the projection in the detector plane (Fig. A1).
For 4 orientation parameters (X t ; Y t ; Z t ; a t ), the effects on projection can be obtained by simple interpolation. It remains two rotational angles b t ; w t whose variation respectively around OY ; OZ requires to recalculate the projection of the voxels. In mathematical terms: where R is a linear transformation similar to a 2D interpolation (translation, rotation and magnification) for a variation of parameters Á/ t 2 Sð/ t Þ belonging to some subspace Sð/ t Þ 2 R 4 . This property should be exploited to accelerate the calculations. Otherwise, the effects of translation on the projection can be calculated for all possible translations with a pixel size resolution using a small number of FFT [33].

The Relationship Between the Presented Method and the Maximum Correlation
In order to keep things simple, we deal here with the case where two images I 1 and I 2 are misaligned only along a single dimension (OX). In such a case, the cost function is reduced to: EðÁu; aÞ ¼ X N i¼1 w i Á ½I 1 ðu i Þ À a Á I 2 ðu i À ÁuÞ 2 N is the number of pixels in the domain of interest (see Fig. 2), Áu the shift in position, a is a factor taking into account the attenuation, and w is a weighting function (which may be a function of u i and u i À Áu the positions of the pixels that correspond in the two images). Minimizing EðÁu; aÞ with respect to a: Tilt geometry: ðX ; Y ; ZÞ coordinate system fixed. OZ is the optical axis. The OX ; OY axes are parallel to the detector pixel rows and columns. The specimen port tilts about the tilt axis and angle h. Due to mechanical imprecision, the specimen port may be shifted about X t ; Y t ; Z t and slightly tilted about b t ; w t ; a t (along/around OX ; OY ; OZ respectively). As the first term of E þ ðÁuÞ does not depend on Áu, E þ ðÁuÞ is minimized with respect to Áu if and only if QðÁuÞ is maximized. Under the following assumptions: 1. the weights are constant (i.e. the noise level is the same for all pixels), 2. the denominator of QðÁuÞ is almost thesame whatever Áu, 3. and there is no contrast inversion between the two images (i.e. a þ ðÁuÞ > 0), the maximization of QðÁuÞ is equivalent to maximizing: CðÁuÞ ¼ X N i¼1 I 1 ðu i Þ Á I 2 ðu i À ÁuÞ (the numerator of a þ ðÁxÞ under the above assumptions) which is nothing else than the cross-correlation between the two images. While the 3rd assumption is reasonable, the two others are more obvious: the noise level may depend on the pixel and, if there are any structures in the images, the denominator of QðÁuÞ depend on Áu. Note that, if there are no such structures, registration is worthless so, at least, the 2nd assumption does not apply. In fact, EðÁu; aÞ can be seen as the opposite of the log-likelihood of the data given the model assuming Gaussian noise (not necessarily uniform). Thus, our approach derives from the maximum likelihood method by making less approximations (in particular the second one) than the maximum correlation method. For this reason, our method is likely to be superior.