28

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.

Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for people to use right away? An Ubuntu or a light weight OS with latest version of Python, R (including IDEs), and other open source data visualization tools installed will be ideal. I haven't come across one in my quick search on Google.

Please let me know if there are any or if someone of you have created one for yourself? I assume some universities might have their own VM images. Please share such links.

VividD
  • 666
  • 7
  • 19
JeanVuda
  • 431
  • 4
  • 6

5 Answers5

13

There is another choice which popular recently: docker(https://www.docker.com). Docker is a container and let you create/maintain a working environment very easily and fast.

Hope that would help you.

fansia
  • 578
  • 3
  • 9
12

If you are looking for a VM with a bunch of tools preinstalled, try the Data Science Toolbox.

Sean Owen
  • 6,664
  • 6
  • 33
  • 44
8

While Docker images are now more trendy, I personally find Docker technology not user-friendly, even for advanced users. If you are OK with using non-local VM images and can use Amazon Web Services (AWS) EC2, consider R-focused images for data science projects, pre-built by Louis Aslett. The images contain very recent, if not the latest, versions of Ubuntu LTS, R and RStudio Server. You can access them here.

Besides main components I've listed above, the images contain many useful data science tools built-in as well. For example, the images support LaTeX, ODBC, OpenGL, Git, optimized numeric libraries and more.

Aleksandr Blekh
  • 6,603
  • 4
  • 29
  • 55
5

Did you try Cloudera's QuickStart VM?:

I found it very easy to run it and it includes open source software such as Mahout and Spark.

Emre Sevinç
  • 175
  • 2
  • 5
5

Today I used this repository and built it with docker. It is a docker image building spark based on Hadoop image of the same owner. If you to use spark, it has a python api called pyspark.

Ethan
  • 1,657
  • 9
  • 25
  • 39
Evren Kutar
  • 151
  • 2