Master Thesis Defense by Mikkel Peter Lemming
Title: Analysing Damped Lyman-alpha and Metal Absorption in Simulated Quasar Using Machine Learning
Abstract: The Sloan Digital Sky Survey (SDSS) has removed quasar spectra based on their intensity and reddening, leading to a biased dataset, potentially skewing the true metal contents at some redshifts. The 4Most group aims to create an unbiased dataset, studying the metal distribution throughout the universe’s lifespan. This new survey is expected to observe many quasars, that need to be analyzed in the future. This thesis aims to speed up this analysis using machine learning models, trained on data simulated to resemble the spectra expected to be seen in the 4Most survey. In total, 5 models are created with the combined ability to find Dampened Lyman Alpha Absorbers (DLAs) and sub-DLAs with log(NHI ≥ 20.0), estimate their redshift and column density, along with the column density of metals in the same system. The metals analyzed in this thesis are CII, AlII, FeII, SiII and MgII. Each model has been tested with different input shapes and sizes, types of preprocessing and model architecture, to optimize the individual task. The models can identify whether or not a DLA is in an image, with an accuracy of 98.8%, while finding 99.2% of all DLAs and inserting a false positive into 1.7% of the images. The DLA redshift is estimated so that 75% of the predictions are within °æ2.3 °ø 10−3 of the true value. This number is improved by using the metal, correcting the results to about °æ9.6 °ø 10−4. The column density of DLAs and sub-DLAs are predicted with a standard deviation of 0.17 dex and the metals have standard deviations ranging from 0.29-0.47 dex, depending on the metal. With MgII and CII being harder to estimate compared to the other three. As the models are meant to be used on real data, we test if models trained on simulated data can analyze pre-analyzed SDSS data. We find that when the resolution of the simulated data matches the SDSS test input, the models do reasonably well. Using this as proof of concept, we believe the models are useful for analysis of the future 4Most spectra, so long as the resolution is the same as the simulated spectra.
Supervisor:
- Lise Christensen, University of Copenhagen, Niels Bohr Institute
Censor:
- Frank Grundahl, Aarhus University