Benchmarking and scalability of machine-learning methods for photometric redshift estimation

被引：23

作者：

Henghes, Ben ^{[1
]}

Pettitt, Connor ^{[2
]}

Thiyagalingam, Jeyan ^{[2
]}

Hey, Tony ^{[2
]}

Lahav, Ofer ^{[1
]}

机构：

[1] UCL, Dept Phys & Astron, Gower St, London WC1E 6BT, England

[2] Rutherford Appleton Lab, Sci Comp Dept, Sci & Technol Facil Council STFC, Harwell Campus, Didcot OX11 0QX, Oxon, England

来源：

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY | 2021年 / 505卷 / 04期

基金：

英国科学技术设施理事会; 美国国家科学基金会; 欧洲研究理事会; 美国安德鲁·梅隆基金会;

关键词：

methods: data analysis; galaxies: distances and redshifts; cosmology: observations; DIGITAL SKY SURVEY;

D O I：

10.1093/mnras/stab1513

中图分类号：

P1 [天文学];

学科分类号：

0704 ;

摘要：

Obtaining accurate photometric redshift (photo-z) estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce photo-z estimations, there has been a shift towards using machine-learning techniques. However, there has not been as much of a focus on how well different machine-learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine-learning methods for photo-z estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) data set, we analysed a variety of the most used machine-learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms' performance and scalability for this task. Furthermore, by introducing a new optimization method, time-considered optimization, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested, we found that the Random Forest performed best with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed very similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe that benchmarks like this will become essential with upcoming surveys, such as the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST), which will capture billions of galaxies requiring photometric redshifts.

引用

页码：4847 / 4856

页数：10

共 40 条

[1]

Abbott T., 2016, cation, V460, P1270, DOI DOI 10.1093/MNRAS/STW641

[2] A comparison of six photometric redshift methods applied to 1.5 million luminous red galaxies [J].

Abdalla, F. B. ;

Banerji, M. ;

Lahav, O. ;

Rashkov, V. .

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2011, 417 (03) :1891-1903

[3] THE ELEVENTH AND TWELFTH DATA RELEASES OF THE SLOAN DIGITAL SKY SURVEY: FINAL DATA FROM SDSS-III [J].

Alam, Shadab ;

Albareti, Franco D. ;

Allende Prieto, Carlos ;

Anders, F. ;

Anderson, Scott F. ;

Anderton, Timothy ;

Andrews, Brett H. ;

Armengaud, Eric ;

Aubourg, Eric ;

Bailey, Stephen ;

Basu, Sarbani ;

Bautista, Julian E. ;

Beaton, Rachael L. ;

Beers, Timothy C. ;

Bender, Chad F. ;

Berlind, Andreas A. ;

Beutler, Florian ;

Bhardwaj, Vaishali ;

Bird, Jonathan C. ;

Bizyaev, Dmitry ;

Blake, Cullen H. ;

Blanton, Michael R. ;

Blomqvist, Michael ;

Bochanski, John J. ;

Bolton, Adam S. ;

Bovy, Jo ;

Bradley, A. Shelden ;

Brandt, W. N. ;

Brauer, D. E. ;

Brinkmann, J. ;

Brown, Peter J. ;

Brownstein, Joel R. ;

Burden, Angela ;

Burtin, Etienne ;

Busca, Nicolas G. ;

Cai, Zheng ;

Capozzi, Diego ;

Rosell, Aurelio Carnero ;

Carr, Michael A. ;

Carrera, Ricardo ;

Chambers, K. C. ;

Chaplin, William James ;

Chen, Yen-Chi ;

Chiappini, Cristina ;

Chojnowski, S. Drew ;

Chuang, Chia-Hsun ;

Clerc, Nicolas ;

Comparat, Johan ;

Covey, Kevin ;

Croft, Rupert A. C. .

ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2015, 219 (01)

[4] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].

ALTMAN, NS .

AMERICAN STATISTICIAN, 1992, 46 (03) :175-185

[5] Cosmology and fundamental physics with the Euclid satellite [J].

Amendola, Luca ;

Appleby, Stephen ;

Avgoustidis, Anastasios ;

Bacon, David ;

Baker, Tessa ;

Baldi, Marco ;

Bartolo, Nicola ;

Blanchard, Alain ;

Bonvin, Camille ;

Borgani, Stefano ;

Branchini, Enzo ;

Burrage, Clare ;

Camera, Stefano ;

Carbone, Carmelita ;

Casarini, Luciano ;

Cropper, Mark ;

de Rham, Claudia ;

Dietrich, Joerg P. ;

Di Porto, Cinzia ;

Durrer, Ruth ;

Ealet, Anne ;

Ferreira, Pedro G. ;

Finelli, Fabio ;

Garcia-Bellido, Juan ;

Giannantonio, Tommaso ;

Guzzo, Luigi ;

Heavens, Alan ;

Heisenberg, Lavinia ;

Heymans, Catherine ;

Hoekstra, Henk ;

Hollenstein, Lukas ;

Holmes, Rory ;

Hwang, Zhiqi ;

Jahnke, Knud ;

Kitching, Thomas D. ;

Koivisto, Tomi ;

Kunz, Martin ;

La Vacca, Giuseppe ;

Linder, Eric ;

March, Marisa ;

Marra, Valerio ;

Martins, Carlos ;

Majerotto, Elisabetta ;

Markovic, Dida ;

Marsh, David ;

Marulli, Federico ;

Massey, Richard ;

Mellier, Yannick ;

Montanari, Francesco ;

Mota, David F. .

LIVING REVIEWS IN RELATIVITY, 2018, 21 :1-345

[6]

[Anonymous], 1984, Classifcation and Regression Trees

[7] Bayesian photometric redshift estimation [J].

Benítez, N .

ASTROPHYSICAL JOURNAL, 2000, 536 (02) :571-583

[8]

Bolzonella M, 2000, ASTRON ASTROPHYS, V363, P476

[9] SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE [J].

BREIMAN, L ;

SPECTOR, P .

INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) :291-319

[10] Bagging predictors [J].

Breiman, L .

MACHINE LEARNING, 1996, 24 (02) :123-140

← 1 2 3 4 →