Master Thesis defense by Ioannis Mageiras & Marina Koukouvaou

Title: Applying machine learning on Quasar Selection

Abstract: This work constitutes an investigation of how Machine Learning (ML) can be effectively used in the quasar selection process. This exploration gave birth to a plethora of XGBoost machine learning models trained on different surveys' features, from optical to near and mid-IR, as well as on astrometric features.  The ability of machine learning to also predict outliers, such as high-redshift, dust-reddened and Broad Absorption Line (BAL) quasars, is also tested and quantified through direct observations. Based on these observations, we argue that the field of machine learning can automatize the quasar selection process and improve it, incorporating the former selection techniques in its core, but using them in a more dynamical way. This is possible because during the training, the model recognizes patterns and relations between the variables in a more flexible and multi-dimensional way, than the stiff formulations of empirical cut-offs and criteria can describe. Throughout the different validation approaches, we observe this trend; the ML models retrieve quasars that would have evaded other selection techniques, increasing the completeness, without necessarily decreasing the purity of the quasar catalogues. Some of the interesting quasar candidates that arise from the ML predictions are selected and spectroscopically observed at the Nordic Optical Telescope (NOT) in La Palma, Spain.  Based on this work and with the use of Gaia’s observations alone, we make a catalogue of 372.000 quasar candidates, with high completeness but with a low purity of ∼ 75%, for galactic latitudes b > 40 degrees. Applying a more robust machine learning model that we trained, we provide a purer sub-catalogue of 42.300 quasar candidates with a very high estimated completeness and purity (∼ 99%). Critical cut-offs on the S/N_proper motion are applied throughout the process, in order to guarantee a low stellar contamination in our catalogues. After all those considerations, we re-evaluate the quasar surface density. We propose a mean surface density of ∼ 37 quasars per square degree for Gmag < 20.  The construction and application of a Regression model, in order to find the photometric redshifts of the predicted quasars, is another outcome of our work. With an upper limit of redshift z = 5 in the training set, this model can predict a quasar’s redshift within 0.28 root mean square error.  The classification and regression tools combined, can provide a handy way to easily classify astronomical sources, detect the quasars among them, and sort them into redshift bins. In that scheme, quasars in different redshifts can be insightfully pinpointed and selected for observations. This would allow their distant epochs to be studied and the interesting physics that characterize them to be unraveled.

Supervisor:

  • Johan Fynbo, University of Copenhagen, Niels Bohr Insitute

Censor:

  • Frank Grundahl, Aarhus University, Department of Physics and Astronomy