Highest Voted 'efficiency' Questions - Data Science Stack Exchange

94

votes

12 answers

How big is big data?

Lots of people use the term big data in a rather commercial way, as a means of indicating that large datasets are involved in the computation, and therefore potential solutions must have good performance. Of course, big data always carry associated…

asked May 14 '14 at 03:56

Rubens

4,117
5
25
42

65

votes

6 answers

When is a Model Underfitted?

Logic often states that by underfitting a model, it's capacity to generalize is increased. That said, clearly at some point underfitting a model cause models to become worse regardless of the complexity of data. How do you know when your model has…

efficiency algorithms parameter

asked Jun 13 '14 at 16:44

blunders

1,932
2
15
19

22

votes

1 answer

XGBRegressor vs. xgboost.train huge speed difference?

If I train my model using the following code: import xgboost as xg params = {'max_depth':3, 'min_child_weight':10, 'learning_rate':0.3, 'subsample':0.5, 'colsample_bytree':0.6, 'obj':'reg:linear', 'n_estimators':1000, 'eta':0.3} features =…

machine-learning python decision-trees xgboost efficiency

asked Mar 01 '17 at 19:15

user1566200

315
1
3
8

14

votes

4 answers

Looking for example infrastructure stacks/workflows/pipelines

I'm trying to understand how all the "big data" components play together in a real world use case, e.g. hadoop, monogodb/nosql, storm, kafka, ... I know that this is quite a wide range of tools used for different types, but I'd like to get to know…

machine-learning bigdata efficiency scalability distributed

asked Jun 17 '14 at 10:37

chrshmmmr

143
7

13

votes

2 answers

Is FPGrowth still considered "state of the art" in frequent pattern mining?

As far as I know the development of algorithms to solve the Frequent Pattern Mining (FPM) problem, the road of improvements have some main checkpoints. Firstly, the Apriori algorithm was proposed in 1993, by Agrawal et al., along with the…

bigdata data-mining efficiency state-of-the-art

asked Jul 12 '14 at 17:25

Rubens

4,117
5
25
42

12

votes

3 answers

Best languages for scientific computing

It seems as though most languages have some number of scientific computing libraries available. Python has Scipy Rust has SciRust C++ has several including ViennaCL and Armadillo Java has Java Numerics and Colt as well as several other Not to…

efficiency statistics tools knowledge-base

asked Jun 16 '14 at 19:14

ragingSloth

1,854
3
14
15

12

votes

2 answers

Tradeoffs between Storm and Hadoop (MapReduce)

Can someone kindly tell me about the trade-offs involved when choosing between Storm and MapReduce in Hadoop Cluster for data processing? Of course, aside from the obvious one, that Hadoop (processing via MapReduce in a Hadoop Cluster) is a batch…

bigdata efficiency apache-hadoop distributed

asked Jun 01 '14 at 10:25

mbbce

347
2
8

10

votes

3 answers

How do various statistical techniques (regression, PCA, etc) scale with sample size and dimension?

Is there a known general table of statistical techniques that explain how they scale with sample size and dimension? For example, a friend of mine told me the other day that the computation time of simply quick-sorting one dimensional data of size n…

bigdata statistics efficiency scalability

asked Aug 05 '14 at 18:36

Bridgeburners

229
1
7

10

votes

2 answers

Kernel trick explanation

In support vector machines, I understand it would be computationally prohibitive to calculate a basis function at every point in the data set. However, it is possible to find this optimal solution due to the so-called kernel trick. Other answers to…

svm efficiency

asked Mar 12 '17 at 15:21

user1717828

245
1
3
9

10

votes

1 answer

What is the most efficient data indexing technique

As we all know, there are some data indexing techniques, using by well-known indexing apps, like Lucene (for java) or Lucene.NET (for .NET), MurMurHash, B+Tree etc. For a No-Sql / Object Oriented Database (which I try to write/play a little around…

nosql efficiency indexing data-indexing-techniques .net

asked May 18 '14 at 14:37

sihirbazzz

287
2
15

10

votes

4 answers

Why is it hard to grant efficiency while using libraries?

Any small database processing can be easily tackled by Python/Perl/... scripts, that uses libraries and/or even utilities from the language itself. However, when it comes to performance, people tend to reach out for C/C++/low-level languages. The…

bigdata efficiency performance

asked May 18 '14 at 14:02

Rubens

4,117
5
25
42

8

votes

2 answers

Filtering spam from retrieved data

I once heard that filtering spam by using blacklists is not a good approach, since some user searching for entries in your dataset may be looking for particular information from the sources blocked. Also it'd become a burden to continuously validate…

bigdata efficiency

asked Jun 15 '14 at 15:11

Rubens

4,117
5
25
42

8

votes

3 answers

How to compare experiments run over different infrastructures

I'm developing a distributed algorithm, and to improve efficiency, it relies both on the number of disks (one per machine), and on an efficient load balance strategy. With more disks, we're able to reduce the time spent with I/O; and with an…

bigdata efficiency performance scalability distributed

asked Jun 15 '14 at 00:00

Rubens

4,117
5
25
42

8

votes

3 answers

Levenshtein distance vs simple for loop

I have recently begun studying different data science principles, and have had a particular interest as of late in fuzzy matching. For preface, I'd like to include smarter fuzzy searching in a proprietary language named "4D" in my workplace, so…

distance efficiency fuzzy-logic

asked Feb 19 '22 at 14:32

Jadon Steinmetz

83
5

6

votes

2 answers

How to best accomplish high speed comparison of like data?

I attack this problem frequently with inefficiency because it's always pretty low on the priority list and my clients are resistant to change until things break. I would like some input on how to speed things up. I have multiple datasets of…

efficiency scalability sql

asked Jun 13 '14 at 10:57

Steve Kallestad

3,208
4
23
41

Questions tagged [efficiency]