Optimize your CPU for Deep Learning

Now you have set up the Intel’s python distribution in your PC/laptop.

It’s time to enter the ML pipeline.

Intel has provided optimization for tensorflow via all the distribution channels and is very smooth to set up.

You can read more about it here.

Let’s see how we can install optimized tensorflow for our CPU.

Intel Software provides an optimized math kernel library(mkl) which optimizes the mathematical operations and provides the users with required speed-up.

Thus, we will install tensorflow-mkl as follows.

conda install tensorflow-mklOr with pip, one can set it up as follows.

pip install intel-tensorflowVoila!! Tensorflow is now up and running in your system with necessary optimizations.

And if you are a Keras fan then you can set it up with a simple command: -conda install keras -c intel4) Set up JupyterSince we have created a new virtual environment, it will not come with spyder or jupyter notebooks by default.

However, it is straightforward to set these up.

With a single line, we can do wonders.

conda install jupyter -c intel5) Activate the Environment and start ExperimentingNow that we have set up everything, it’s time to get our hands dirty as we start coding and experimenting with various ML and DL approaches on our optimized CPU systems.

Firstly, before executing any code, make sure that you are using the right environment.

You need to activate the virtual environment before you can use the libraries installed in it.

This activation step is an all-time process, and it is effortless.

Write the following command in your anaconda prompt, and you’re good to go.

conda activate intelTo make sanity checks on your environment, type the following in the command prompt/shell once the environment is activated.

pythonOnce you press enter after typing python, the following text should appear in your command prompt.

Make sure it says “Intel Corporation” between the pipe and has the message “Intel(R) Distribution for Python is brought to you by Intel Corporation.


These validate the correct installation of Intel’s Python Distribution.

Python 3.


8 |Intel Corporation| (default, Feb 27 2019, 19:55:17) [MSC v.

1900 64 bit (AMD64)] on win32Type "help", "copyright", "credits" or "license" for more information.

Intel(R) Distribution for Python is brought to you by Intel Corporation.

Please check out: https://software.


com/en-us/python-distributionNow you can use the command line to experiment or write your scripts elsewhere and save them with .

py extension.

These files can then be accessed by navigating to the location of the file via “cd” command and running the script via: -(intel) C:UsersUser>python script.

pyBy following steps 1 to 4, you will have your system ready with the level of Intel xyz* as mentioned in the performance benchmark charts above.

These are still not multi-processor-based thread optimized.

I will discuss below how to achieve further optimization for your multi-core CPU.

Multi-Core OptimizationTo add further optimizations for your multi-core system, you can add the following lines of code to your .

py file, and it will execute the scripts accordingly.

Here NUM_PARALLEL_EXEC_UNITS represent the number of cores you have; I have a quad-core i7.

Hence the number is 4.

For Windows users, you can check the count of cores in your Task Manager via navigating to Task Manager -> Performance -> CPU -> Cores.

from keras import backend as Kimport tensorflow as tfNUM_PARALLEL_EXEC_UNITS = 4config = tf.

ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count={'CPU': NUM_PARALLEL_EXEC_UNITS})session = tf.



environ["OMP_NUM_THREADS"] = "4"os.

environ["KMP_BLOCKTIME"] = "30"os.

environ["KMP_SETTINGS"] = "1"os.

environ["KMP_AFFINITY"] = "granularity=fine,verbose,compact,1,0"If you’re not using Keras and prefer using core tensorflow, then the script remains almost the same, just remove the following 2 lines.

from keras import backend as KK.

set_session(session)After adding these lines in your code, the speed-up should be comparable to Intel xyz(O) entries in the performance charts above.

If you have a GPU in your system and it is conflicting with the current set of libraries or throwing a cudnn error then you can add the following line in your code to disable the GPU.


environ["CUDA_VISIBLE_DEVICES"] = "-1"ConclusionThat’s it.

You have now an optimized pipeline to test and develop machine learning projects and ideas.

This channel opens up a lot of opportunities for students involving themselves in academic research to carry on their work with whatever system they have.

This pipeline will also prevent the worries of privacy of private data on which a practitioner might be working.

It is also to be observed that with proper fine-tuning, one can obtain a 3.

45x speed-up in their workflow which means that if you are experimenting with your ideas, you can now work three times as fast as compared to before.


. More details

Leave a Reply