Use for questions about the different software and hardware used to assist in, and solve, data science problems
Questions tagged [tools]
70 questions
63
votes
9 answers
Tools and protocol for reproducible data science using Python
I am working on a data science project using Python.
The project has several stages.
Each stage comprises of taking a data set, using Python scripts, auxiliary data, configuration and parameters, and creating another data set.
I store the code in…
Yuval F
- 761
- 1
- 6
- 7
61
votes
10 answers
IDE alternatives for R programming (RStudio, IntelliJ IDEA, Eclipse, Visual Studio)
I use RStudio for R programming. I remember about solid IDE-s from other technology stacks, like Visual Studio or Eclipse.
I have two questions:
What other IDE-s than RStudio are used (please consider providing some brief description on them).
Does…
IgorS
- 5,474
- 11
- 34
- 43
58
votes
8 answers
Why do internet companies prefer Java/Python for data scientist job?
I see a many times in job description for data scientist asking for Python/Java experience and disregard R. Below is a personal email I received from chief data scientist of a company I applied for through linkedin.
X, Thanks for connecting and…
StatguyUser
- 885
- 1
- 8
- 20
44
votes
10 answers
Do data scientists use Excel?
I would consider myself a journeyman data scientist. Like most (I think), I made my first charts and did my first aggregations in high school and college, using Excel. As I went through college, grad school and ~7 years of work experience, I…
JHowIX
- 543
- 1
- 4
- 6
28
votes
4 answers
What makes columnar databases suitable for data science?
What are some of the advantages of columnar data-stores which make them more suitable for data science and analytics?
Dawny33
- 8,476
- 12
- 49
- 106
28
votes
5 answers
VM image for data science projects
As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.
Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for…
JeanVuda
- 431
- 4
- 6
15
votes
2 answers
What is the difference between Hadoop and noSQL
I heard about many tools / frameworks for helping people to process their data (big data environment).
One is called Hadoop and the other is the noSQL concept. What is the difference in point of processing?
Are they complementary?
рüффп
- 295
- 5
- 16
13
votes
5 answers
What are helpful annotation tools (if any)
I'm looking for tools that would help me and my team annotate training sets. I work in an environment with large sets of data, some of which are un- or semi-structured. In many cases there are registration that help in finding a grounded truth. In…
S van Balen
- 1,364
- 1
- 9
- 28
12
votes
3 answers
Best languages for scientific computing
It seems as though most languages have some number of scientific computing libraries available.
Python has Scipy
Rust has SciRust
C++ has several including ViennaCL and Armadillo
Java has Java Numerics and Colt as well as several other
Not to…
ragingSloth
- 1,854
- 3
- 14
- 15
12
votes
2 answers
Book keeping of experiment runs and results
I am a hands on researcher and I like testing out viable solutions, so I tend to run a lot of experiments. For example, if I am calculating a similarity score between documents, I might want to try out many measures. In fact, for each measure I…
machine-wisdom
- 123
- 5
12
votes
2 answers
Opensource tools for help in mining stream of leader board scores
Consider a stream containing tuples (user, new_score) representing users' scores in an online game. The stream could have 100-1,000 new elements per second. The game has 200K to 300K unique players.
I would like to have some standing queries like:…
Tahir Akhtar
- 315
- 2
- 9
11
votes
4 answers
What initial steps should I use to make sense of large data sets, and what tools should I use?
Caveat: I am a complete beginner when it comes to machine learning, but eager to learn.
I have a large dataset and I'm trying to find pattern in it. There may / may not be correlation across the data, either with known variables, or variables that…
user3791372
- 408
- 3
- 14
10
votes
5 answers
Tool to Generate 2D Data via Mouse Clicking
Often when I am learning new machine learning methods or experimenting with a data analysis algorithm I need to generate a series of 2D points. Teachers also do this often when making a lesson or tutorial.
In some cases I just create a function, add…
MD004
- 310
- 1
- 3
- 10
9
votes
3 answers
Google prediction API: What training/prediction methods Google Prediction API employs?
The details of the Google Prediction API are on this page, but I am not able to find any details about the prediction algorithms running behind the API.
So far I have gathered that they let you provide your preprocessing steps in PMML format.
Tahir Akhtar
- 315
- 2
- 9
7
votes
1 answer
Lightweight data provenance tool
One of the problems I often encounter is that of poor data provenance.
When I do research I continuously make modifications to my code and rerun experiments. Each time I'm faced with a number of questions, such as: do I save the old results…
Benjamin B.
- 245
- 1
- 6