Land surface evapotranspiration (ET) is an important component of the surface energy budget and water cycle. To solve the problem of the spatial‐scale mismatch between in situ observations and remotely sensed ET, it is necessary to find the most appropriate upscaling approach for acquiring ground truth ET data at the satellite pixel scale. Based on a data set from two flux observation matrices in the middle stream and downstream of the Heihe River Basin, six upscaling methods were intercompared via direct validation and cross validation. The results showed that the area‐weighted method performed better than the other five upscaling methods introducing auxiliary variables (the integrated Priestley‐Taylor equation, weighted area‐to‐area regression kriging [WATARK], artificial neural network, random forest [RF], and deep belief network methods) over homogeneous underlying surfaces. Over moderately heterogeneous underlying surfaces, the WATARK method performed better. However, the RF method performed better over highly heterogeneous underlying surfaces. A combined method (using the area‐weighted and WATARK methods for homogeneous and moderately heterogeneous underlying surfaces, respectively, and using the RF method for highly heterogeneous underlying surfaces) was proposed to acquire the daily ground truth ET data at the satellite pixel scale, and the errors in the ground truth ET data were evaluated. The Dual Temperature Difference (DTD) and ETMonitor were validated using ground truth ET data, which solve the problem of the spatial‐scale mismatch and quantify uncertainties in the validation process.