# Give Or Take a Billion Years

Give Or Take a Billion YearsDIY age estimate of the universe based on data from Leda, the public database for physics of galaxies.

Rasmus GrothBlockedUnblockFollowFollowingApr 18As part of building the trading platform, CCAT, I am teaching myself data science.

My daughter, Laura, was doing a project on Hubble’s law so we thought we’d use this as an occasion to hang out.

Laura, did her report and I did a Jupyter notebook alongside to generate the image and practice my admittedly very rudimentary data science skills.

What you will gain from reading this is:Good datasets on galaxiesEquations to useUnits to useCode used to generate results and imagesWe had a lot of help from Peter Lauersen, astrophysicist at the University of Oslo and got inspiration from Zoninations Reddit post and R code.

Please note that this is a mirror of the actual executable Jupyter Notebook available here.

In the following we will use data from Leda, the database of physics of galaxies, to verify Hubbles law:In other words, there is a linear relationship between the distance of a galaxy and it’s speed relating to us — or that it follows the structure of a normal linear equation:With this knowledge we can calculate the age of the universeAquire the dataQueryThe following query was passed to the leda database interface:SELECT objname, v3k, mod0 WHERE (objtype='G') AND (v3k>3000) AND (v3k<30000) AND (mod0 IS NOT NULL) AND (v3k IS NOT NULL)We are elliminating objects that are not galaxies, (objtype=’G’), objects with very low velocities (v3k > 3000) because their local velocities and rotations skew the results, objects with very high velocities (v3k < 30000) because the light we are getting from them only reflects their velocities in the early universe when the accelleration was significantly different.

import pandas as pdimport numpy as npfrom numpy import log10,sqrtimport matplotlib.

pyplot as pltfrom astropy import units as ufrom astropy.

cosmology import Planck15import astropy.

constants as ccfrom scipy.

optimize import curve_fitThe results were saved in a text file called, leda.

txt locally and then loaded into a pandas dataframe in Python.

txt')The first five lines look like this:df.

head()Tidy the data# Remove empty column at the enddf = df.

iloc[:,0:3]# Remove rows with missing valuesdf = df.

dropna()# Remove whitespace from the headersdf = df.

rename(columns=lambda x: x.

strip())# Rename the objname column to galaxydf = df.

rename(columns={'objname':'galaxy'})# Display a sample of the dataframedf.

head()Luminosity Based DistanceConvert the magnitude (mod0) to Luminosity-distance in parsec:df['dl_mpc'] = 10**((df['mod0']+5)/5) / 1000000df.

head()Physical DistanceThe luminosity distance does not account for redshift and time dilation caused by differences in velocity and gravitational effects as the photons travel from the source to us.

To get the proper distance, we first need the redshift factor, ????, which we get by dividing the velocity with the speed of light (cc.

c):df[‘z’] = df[‘v3k’] / cc.

cAnd then we divide the luminosity distance, ????????, with the redshift factor 1+????:df['d_mpc'] = df['dl_mpc'] / (1+df['z'])Export Tidy Dataset# Save to filedf.

to_csv('galaxies.

csv')Best linear fitdef lin(x, a): return a * xcoeff, cov= curve_fit(lin, df['d_mpc'], df['v3k'])# The slope of the best linear fit a = coeffaResult:66.

59753735677145Age of the UniverseSeconds# Convert a from mpc based to km baseda_km = a / 1000000 / 30856775814913.

67age_sec = (1/a_km)age_sec = age_secage_secResult:4.

6333208463264954e+17Years# Age of the universe in yearsage_year = age_sec / (60*60*24*365)"{:,}".

format(int(age_year))Result:'14,692,164,023'ToleranceCo-variancecovResult:0.

20239139101111292????^2# Residual sum of squares (ss_tot)residuals = df['v3k'] – lin(df['d_mpc'], coeff)ss_res = np.

sum(residuals**2)# Total sum of squares (ss_tot)ss_tot = np.

sum((df['v3k']-np.

mean(df['v3k']))**2)# R squaredr_squared = 1 – (ss_res / ss_tot)r_squaredResult:0.

7590999713188041Plot# Store the distance in mpc in an arrayx = df['d_mpc'].

values# Store the velocity in km/s in an arrayy = df['v3k'].

values # v3k# Least Square Best Fit linef = lambda x: a * x# Initialize plot and subplotfig2 = plt.

figure(figsize=(25,20))g2 = fig2.

add_subplot(1, 1, 1)# Set background color to blackg2.

set_facecolor((0, 0, 0))# Plot datasetg2.

scatter(df['d_mpc'], df['v3k'], c='yellow', s=5)# Plot best fit lineg2.

plot(x,f(x), c="white", label="fit line", linewidth=0.