Understanding and use python virtualenvs from Data Scientist perspectiveErnesto ArbitrioBlockedUnblockFollowFollowingJul 13, 2017A Virtual Environment is a real good way to keep the dependencies required by different projects in separate places, by creating virtual Python environments for each of them.
It solves the “Project X depends on version 1.
x but, Project Y needs 4.
x” dilemma, and keeps your global site-packages directory clean and manageable.
For example, you can work on a project which requires matplotlib 1.
3 while also maintaining a project which requires matplotlib 1.
There are different tools that can manage python virtual environment, those I will show are:virtualenvanaconda (by continuum analytics)The first one is the most popular for general pourpose coding projects, anaconda is much more suitable for who works in data science field.
In this post I would like to explain out how to install and use both of them to create your python virtualenv.
VirtualenvWhen you run virtualenv command inside your shell, it creates a folder which contains all the necessary executables to use the packages that a Python project would need.
To install virtualenv via pip:$ pip install virtualenvTo install python pip via s.
package manager (ubuntu for instance):$ sudo apt-get install python-pipBasic usageCreate a firt virtual environment for your project:$ cd my_venv_dir$ virtualenv myprojectvirtualenv myproject will create a folder in the directory you are which will contain the Python executable files, and a copy of the pip library which you can use to install other packages.
Usually I do not create the virtual environment within the source code directory in order to keep venv and code separate.
In this way I don’t need to exclude the virtualenv directory from the version control system I will use for software versioning.
In your virtualenvironment directory you shuold have something like this:$ ls -ladrwxr-xr-x 16 user staff 544 May 15 06:33 bindrwxr-xr-x 3 user staff 102 May 15 06:30 includedrwxr-xr-x 4 user staff 136 May 15 06:33 lib-rw-r–r– 1 user staff 60 May 15 06:33 pip-selfcheck.
jsonYou can also use the Python interpreter of your choice (like python2.
$ virtualenv -p /usr/local/bin/python2 myprojectNaturally you need change the python 2 path with yours.
To begin using the virtual environment, it needs to be activated:$ source /myproject/bin/activateThe name of the current virtual environment will now appear as prefix of the prompt (e.
$(myproject)username@yourcomputer) to let you know that it’s active.
From now on, any package that you install using pip will be placed in the myproject folder, in this way the global Python installation will remain clean.
Install packages as usual:$ pip install matplotlibIf you want to deactivate your virtualenv with:$ deactivateThis puts you back to the system’s default Python interpreter with all its installed libraries.
To delete a virtual environment, just delete the folder.
(In this case, it would be rm -rf myproject.
)After a while, though, you might end up with a lot of virtual environments littered across your system, and its possible you’ll forget their names or where they were placed.
Extra commandsIn order to keep your environment consistent, it’s a good idea to “freeze” the current state of the environment packages.
To do this, run$ pip freeze > requirements.
txtThis will create a requirements.
txt file, which contains a list of all the packages in the current environment, and their respective versions.
You can see the list of installed packages without the requirements format using pip list.
Later it will be easier for a different developer (or you, if you need to re-create the environment) to install the same packages using the same versions:$ pip install -r requirements.
txtRemember to add the requirements file in your current project directory; this can help ensure consistency across installations, across deployments, and across developers.
Now let’s take a look to the packages you need for starting a new virtual environment for data analytics projects.
The most used by me are: numpy, pandas, matplotlib, jupyter notebook, scipy; naturaly each of you can use those who prefer.
As you can see in the previous paragraphs to install a python lib in a virtualenv you may use pip command; to install each package there are 2 simple ways: install them one by one or create a requirements.
txt file with one library per line:numpypandasmatplotlibjupyterscipyor if you want you can specify the version if you want to avoid the installation of the latest stable release of the softwares:numpy==1.
1scipyNow pip install -r requirements.
txt will install all your software and the related dependencies; and your data science oriented virtualenv is now ready to go.
The Anaconda© python distributionFrom my personal point of view, a more appropriate way than using virtualenv is to adopt the Anaconda platform.
Anaconda is the leading open data science platform powered by Python.
The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.
Additionally, you’ll have access to over 720 packages that can easily be installed with conda, our renowned package, dependency and environment manager, that is included in Anaconda.
If you’re interesting on “Why should I use Anaconda?” I may suggest you to read:https://www.
com/r/Python/comments/3t23vv/what_advantages_are_there_of_using_anaconda/The two principal advantages more useful in my own experience are the user level install of the version of python you want and the “batteries included” for data science (e.
numpy, scipy, PyQt, spyder IDE, etc.
)Install Anaconda is very simple, just download the rigth package for your O.
and use the instructions on the page to install it in the proper way.
For instance if you use Linux or Mac OSX:bash Anaconda-v-0.
shand follow the instructions on the screen.
Note that during the Anaconda installation process it will ask you to add a directive in your bash profile to change your default python path.
In this way Anaconda python distribution will be your new python ecosystem, otherwise if you answer no to use the Anaconda python you have to select the python executable by hand.
Now that Anaconda is installed you are able to create a new conda environment or anyway you can use the root env with a lot of packages.
Let’s see how to create a conda env now:conda create -n envname# if you want to pass a specific python versionconda create -n envname python=2.
7Once the env is created you may activate itsource activate envnameThis should procude something like this:(envname) user@hostname$# check your python version(envname) user@hostname$ pythonPython 3.
2 |Continuum Analytics, Inc.
| (default, Jul 2 2016, 17:52:12)[GCC 4.
1 Compatible Apple LLVM 4.
28)] on darwinType "help", "copyright", "credits" or "license" for more information.
>>>Now, as used with the virtualenv tool we can install the packages we need in our conda env with the conda command.
(envname) user@hostname$ conda install pandas # e.
install pandas packageManaging the environments is quite easy, there are several options to use with conda command; here a good cheatsheet with all the reference you need.
Remember that Anaconda is batteries included so it’s possible you have all the packages you need in the root env.
Without create or activate any environment try to check the library installed in your conda root instance, you should have something like thisuser@hostname$ conda list.
14 py35_0sasl 0.
1 <pip>scikit-image 0.
3 np111py35_1scikit-learn 0.
1 np111py35_2scipy 0.
1 np111py35_0scp 0.
2 <pip>seaborn 0.
As you can notice some packages in the previous snippet are installed via pip, this means that with anaconda and within any conda env you can use also the pip command to get the python modules you want.
Import and Export a conda environmentTo enable other people to create an exact copy of your environment, you can export the active environment file.
Activate the environment you wish to export:Linux, OS X: source activate envnamewindows: activate envnameNow export your env to new declarative file:conda env export > environment.
ymlNOTE: If you already have an environment.
yml file in you current directory, it will be overwritten with the new file.
If you instead of exporting your env you want to create a new conda environment from an .
yml file:conda env create -f environment.
ymlThe enviromnent files can be created by hand just beeing compliant with some basic ruels.
For instance if you want to set your dependencies these are the rules to follow:env_name: statsdependencies: – numpy – pandas – scipyand then save your file with the name you want.
For more details here there is the chapter of the Anaconda guide related to this argument.
Unlike pip, conda is language-agnostic, this permit you to use the R language in your conda environment, and obviously create R based notebooks with jupyter.
The Anaconda team has created an “R Essentials” bundle with the IRKernel and over 80 of the most used R packages for data science, including dplyr, shiny, ggplot2, tidyr,caret and nnet.
Downloading “R Essentials” requires conda.
Miniconda includes conda, Python, and a few other necessary packages, while Anaconda includes all this and over 200 of the most popularPython packages for science, math, engineering, and data analysis.
Users may install all of Anaconda at once, or they may install Miniconda at first and then use conda to install any other packages they need, including any of the packages in Anaconda.
Once you have conda, you may install “R Essentials” into the current environment:conda install -c r r-essentialsand now starting jupyter notebook from your virtual environment $ jupyter notebook you are able to create a new R notebook:ReferencesPart of this post is inspired byhttps://www.
io/en/latest/dev/virtualenvs/.. More details