Papers

Exploring the Infrared Variable Sky with Machine Learning

N. Miller, P. W. Lucas, Y. Sun
The last decade in astronomy has seen the growth of time-series data and with it, the emer- gence of large surveys. Surveys such as PTF (Law et al., 2009), ZTF (Bellm et al., 2019), CoRoT (Auvergne et al., 2009), HOYS (Froebrich et al., 2018) and VVV (Minniti et al., 2010) provide large amounts of large-area, multi-epoch data. Such surveys bring a multitude of new issues, many of which are in the form of ‘unknown-unknowns’. From this, novel techniques are required to properly analyse these data. Manual analysis is unfeasible and hence, efforts have been taken to develop tools that seek to automate large portions of the data analysis. The new dimension of study afforded to us by these surveys allows us to probe the formation, evolution and death of stars in unique ways. A fundamental issue arises “How can we completely and robustly extract information from modern astronomical time series data?” – Answering this question requires the development of novel methods and the improvement of those already established. In doing so, I aim to further expand and explain the demographics of variable stars in the Milky Way. By coupling more sensitive and robust identification methods with more thorough and complete analysis, I aim to identify and characterise new and known stellar classes. These actions seek to provide a more complete and accurate view of the Milky Way, its structure and demographics. Key contributions of this thesis include the development of a neural network-based false alarm probability (NN FAP) method, which significantly improves the identification of periodic vari- ables in large-scale surveys like VVV, LSST, and TESS. This method generates a universally comparable and unbiased FAP, making it applicable across various types of variable stars, lead- ing to a more complete view of the demographics of periodic variable stars. The creation of the PeRiodic Infrared Milky-way VVV Star-catalogue (PRIMVS) underscores the effort to identify periodic variable stars comprehensively and without bias. Utilising the VVV survey’s depth and breadth, PRIMVS processed over 86 million candidate variable sources using multiple period-finding methods and a novel neural network-based false alarm probability, leading to the identification of approximately 5 million periodic variables. Moreover, the thesis introduces a contrastive learning approach based on the SimCLR framework with a gated recurrent neu- ral network (GRU) backbone, specifically designed to handle stochastically sampled time-series data. This method improves variable star classification by creating semantically meaningful em- beddings, enabling more nuanced and accurate analysis. Additionally, the integration of VVV data with Gaia astrometry enhances distance measurements to star forming regions, while the use of Denoising Diffusion Probabilistic Models (DDPMs) for generating synthetic light curves provides a novel solution for developing extensive training sets.

The verification of periodicity with the use of recurrent neural networks

N. Miller, P. W. Lucas, Y. Sun, Z. Guo, W. J. Cooper, C. Morris
The ability to automatically and robustly self-verify periodicity present in time-series astronomical data is becoming more important as data sets rapidly increase in size. The age of large astronomical surveys has rendered manual inspection of time-series data less practical. Previous efforts in generating a false alarm probability to verify the periodicity of stars have been aimed towards the analysis of a constructed periodogram. However, these methods feature correlations with features that do not pertain to periodicity, such as light curve shape, slow trends and stochastic variability. The common assumption that photometric errors are Gaussian and well determined is also a limitation of analytic methods. We present a novel machine learning-based technique that directly analyses the phase-folded light curve for its false alarm probability. We show that the results of this method are largely insensitive to the shape of the light curve, and we establish minimum values for the number of data points and the amplitude-to-noise ratio.

PeRiodic Infrared Milky-way VVV Star-catalogue : PRIMVS

N. Miller, P. W. Lucas, Y. Sun, Z. Guo, W. J. Cooper, C. Morris
We present the PeRiodic Infrared Milky-way VVV Star-catalogue - ‘PRIMVS’ (not to be confused with "Primus"). We utilise the VVV survey’s unique depth and breadth to investigate the variability of astronomical sources within the Galactic bulge and disk. There is a focus on an unbiased and complete identification and classification of periodic variable stars. Employing internal metrics from the VIRAC table for initial selection, we meticulously clean and preprocess light curves to increase reliability and completeness. Care has been taken to address photometric contamination and other sources of uncertainty. Our approach includes constructing periodograms using Lomb-Scargle, Phase Dispersion Min- imisation, Conditional Entropy, and Gaussian Processes to ascertain periodicity. This above process allowed us to curate a catalogue of 86,507,172 candidate variable sources. Machine learning techniques, particularly decision trees and autoencoders, facilitated the initial steps in classification of a significant portion of these sources.he ability to automatically and robustly self-verify periodicity present in time-series astronomical data is becoming more important as data sets rapidly increase in size.

Contrastive Curves

N. Miller, P. W. Lucas, Y. Sun, Z. Guo, W. J. Cooper, C. Morris
We demonstrate that it is possible to extract semantically meaningful fixed length representations of stochastically sampled time series data. We use a novel neural network architecture (SimCLR with a gated recurrent neural network backbone) to go about this.