To abbreviate the code, we introduce the sum()method, a generator expression and removal of pow().Already by doing these three changes we reduce function calls by 30.37% and gain speed improvement of 41.5 %.In [7]: %prun estimate_pi()In [8]: %timeit -r 2 -n 5 estimate_pi()3.68 s ± 2.39 ms per loop (mean ± std. dev. of 2 runs, 5 loops each)Optimize with VectorizationGiven the fact that we exactly know beforehand how many random numbers should be generated, we can simply make the attempt to place everything before or outside the loop.Remember the if statement on line 10 which takes up nearly 30% of computational time..The only information this if statement requires are two coordinates, hence can again be placed outside the loop.If the option is available we should avoid looping code altogether..Especially in data science we’re familiar with NumPy and pandas, highly optimized libraries for numerical computation..A big advantage in NumPy are arrays internally based on C arrays which are stored in a contiguous block of memory (data buffer-based array).Here we create all random points as an array with shape (n, 2) (line 9) and count how many times the condition is met that the point falls in the circle (line 10).If we benchmark now the numpy version,In [9]: %timeit estimate_pi()388 ms ± 9.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)results in 388ms and therefore 16 times faster compared to the while loop..Numpy’s syntax is similar to standard python methods, instead of sum() we write np.sum() and essentially we’re not iterating over a list anymore, instead we use numpy’s vectorized routines.Memory profilingJuggling with large data sets involves having a clear sight of memory consumption and allocation processes going on in the background..As earlier discussed, there are tools to monitor the memory usage of your notebook.Use %memit in familiar fashion to %timeitIn [10]:%memit estimate_pi()peak memory: 623.36 MiB, increment: 152.59 MiBWe see that the function uses about 600 mebibytes for 1e7 simulations..My hypothesis is that allocating the large array contributes to the most part..Proving this hypothesis with %mprun which checks the memory usage at every line.In [11]:%mprun -f estimate_pi estimate_pi()Very interesting seems line 4 where incrementation is 162.8 MiB yet on the next line overall memory usage only raises by 0.1..What happens here is that we allocate on the right side of the assignment and then drop memory again since inside is not a numpy array anymore.To put things together below is a one liner doing the same as above with the exception that instead of allocating a large array with shape (n, 2) we square x and y points on demand..Although we sacrifice readability, rewriting the expression reduces the number of assignment operations resulting in an even faster solution with 280 ms (22 times faster).In [9]: %timeit estimate_pi()280 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)Out of memoryCreating the array comes with a restriction to available system memory..The allocation process scales linearly with the input parameter n ..If, for example, we would try to set the number of simulations to 1e10 our kernel would crash while creating the array.. More details