We investigate the problem of finding meaningful geometric descriptions of data sets. The approach that we propose is based upon diffusion processes. We show that by designing a local geometry that reflects some quantities of interest, it is possible to construct a diffusion operator whose eigendecomposition produces an embedding of the data into Rn via a diffusion map. In this space, the data points are reorganized in such a way that the geometry combines all the local information captured by the diffusion process, and the Euclidean distance defines a diffusion metric that measures the proximity of points in terms of their connectivity. The case of submanifolds of Rn is the object of greater attention, and we show how to define different kinds of diffusions on these structures in order to recover their Riemannian geometry. General types of anisotropic diffusions are also addressed, and we explain their interest in the study of differential and dynamical systems. Secondly, we introduce a special set of functions that we term "geometric harmonics". These functions allow to perform out-of-sample extensions of empirical functions defined on the data set. We show that the geometric harmonics, and the corresponding restriction and extension operators are a valuable tool for the study of the relation between the intrinsic and extrinsic geometries of a set. In particular, they allow to define a multiscale extension scheme, in which empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle. The work presented here was produced under the supervision of Pr Raphy Coifman.