# Predicting the Stock Market, p-Hacking and Why You Should Be Bullish

Or you could just use leverage and exotic instruments to multiply your returns and live passively off your modest savings.So let’s take a deep dive into predicting the stock market using data science..Some people tried to use a neural network, specifically a recurrent neural network to predict market returns..Let’s see if we can fit a simpler model with using just random numbers!Random Number Generator as a ModelThe predict function below creates a random set of normally distributed daily returns based on the historic standard deviation and mean returns..We set the seed explicitly so we can recreate the best model and use it for forecasting.def predict(mean, std, size, seed=None): """ Returns a normal distribution based on given mean, standard deviation and size""" np.random.seed(seed) return np.random.normal(loc=mean, scale=std, size=size)The apply_returns function just applies our returns to a start price to give us a predicted stock price over time.def apply_returns(start, returns): """ Applies given periodic returns """ cur = start prices = [start] for r in returns: cur += r prices.append(cur) return pricesAnd finally, we’ll want to score the returns..That’s what compare is for.def compare(prediction, actual): # plots a prediction over the actual series plt.plot(prediction, label="prediction") plt.plot(actual, label="actual") plt.legend()Let’s see how a seed of 0 plays out.predict_deltas = predict(prior_log_deltas_mean, prior_log_deltas_std, latest_size, seed = 0)start = latest_log_prices[0]prediction = apply_returns(start, predict_deltas)print("MSE: {:0.08f}".format(score(latest_log_prices, prediction)))compare(prediction=prediction, actual=latest_log_prices.values)MSE: 0.00797138Prediction using a seed of 0It’s not great but it’s a start..If you want to test 20 different factors, specify the factors before you start testing and consider all 20 factors when evaluating your metrics.But most importantly, ask yourself what your model is doing..If you’re doing natural language processing, consider word vectors in relation to synonyms, antonyms and related words.And if you’re doing stock market analysis, ask yourself what you’re actually trying to get out of the model..Why did you validate up to Z?Just feeding stock deltas into a recurrent neural network may achieve the goal of being able to decrease the loss, but with an explanation, you may as well be fitting the values to a random number generator.. More details