Highest Voted 'research' Questions - Data Science Stack Exchange

14

votes

3 answers

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

It could be that I'm misunderstanding the problems space and the iterations of LLAMA, GPT, and PaLM are all based on BERT like many language models are, but every time I see a new paper in improving language models it takes BERT as a based an adds…

asked Aug 03 '23 at 01:11

Ethan

243
1
2
6

8

votes

1 answer

Which of the NIPS 2014 papers are most significant, and why?

As a newcomer to the field, I find many of the NIPS 2014 papers fascinating, but it is difficult for me to evaluate which ones represent real progress over current approaches. Which papers do you think are most significant and are likely to have a…

machine-learning research state-of-the-art

asked Aug 21 '15 at 18:10

Michael R. Bernstein

189
2

7

votes

1 answer

Why use mean revenue in a split test?

I asked a data science question regarding how to decide on the best variation of a split test on the Statistics section of StackExchange. I hope I will have better luck here. The question is basically, "Why is mean revenue per user the best metric…

research cross-validation

asked Jul 16 '14 at 07:47

Keith

326
2
14

5

votes

4 answers

Where can I find resources and papers regarding Data Science in the area of Public Health

I'm quite new to Data Science, but I would like to do a project to learn more about it. My subject will be Data Understanding in Public Health. So I want to do some introductory research to public health. I would like to visualize some data with the…

visualization tableau research

asked Feb 20 '15 at 12:21

Claus Machholdt

153
3

5

votes

4 answers

What kind of regression model should I do?

my research question is the examine the effect of "receiving attention" from other members in an online community on "sustained participation" on the website. I decided to measure "sustained participation" of each user by calculating average time…

regression research

asked Jan 15 '17 at 02:04

user27954

51
1

4

votes

5 answers

Classification training using probabilites and not raw classes (factors)

I have a problem where instead of having classes, i.e. a vector of 0s and 1s, I have the probability of an observation belonging to a class. A vector with 0.1, 0.95, 0.2, 0.3, etc. The obvious approach is using regression and it works relatively…

classification research

asked Aug 10 '19 at 22:24

wacax

3,500
4
26
48

4

votes

1 answer

Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?

We are looking to design a system where specific items or categories of items can be boosted/promoted up or relegated/demoted down the recommendation order. What are the common strategies or standards to do such? A cursory google search did not…

machine-learning data-science-model recommender-system reference-request research

asked Jan 20 '23 at 20:28

JPTheEngineer

41
1

4

votes

1 answer

Zero-shot learning for tabular data?

Can anyone point me to methods for zero-shot learning on tabular data? There is some very cool work being done for zero-shot learning on images and text, but I'm struggling to find work being done to extend these techniques to tabular data.

machine-learning classification class-imbalance research zero-shot-learning

asked Jun 14 '22 at 16:04

tensormoby

73
4

4

votes

1 answer

Determining completeness of dataset

I'm hoping you have some research or experience with determining the completeness of a data set. I'm trying to use a twitter dataset I scraped myself and want to have an indication on the completeness. Obviously, I will miss some data but I am…

dataset statistics research

asked Feb 25 '16 at 16:13

FruitySunrise

41
1

3

votes

1 answer

How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables…

machine-learning anomaly-detection research anomaly

asked Jan 06 '21 at 10:28

AdrienC

31
2

3

votes

1 answer

Identify given patterns in unstructured data like text files

I wasn't sure if I had to ask it here or in Stackoverflow, but since I am also seeking research papers/algorithms and not only code, I decided to do it here. When I have a text, I can manually write a regex to find all the possible outputs from what…

text-mining algorithms research

asked Sep 01 '15 at 13:56

Tasos

3,960
5
25
54

3

votes

1 answer

Need an advice on research topic

I am about to choose ML research topic for my master thesis, but i am at a dead end. The problem is, that while reading research papers, i find solutions, but not an open problems. For now, a came up with such ideas: Research neural network…

research

asked Feb 25 '20 at 08:52

Дмитрий

45
2

3

votes

1 answer

How can there be more true positive than positive?

Currently reading Learning from Little: Comparison of Classifiers Given Little Training In 3 Experiment Results, the following graph is shared: The experiment is described as follow We begin by examining an example set of results for the average…

machine-learning research

asked Sep 13 '18 at 02:21

Adrien Lemaire

161
4

3

votes

2 answers

What is the loss function defined by Mnih and Hinton in their paper “Learning to Label Aerial Images from Noisy Data”?

In section 3.3 of the paper, they state that they use the cross entropy. Then they define the probability for a label to be a false positive as $\theta_0$ and a false negative as $\theta_1$. They use it to somehow modify the loss function but never…

deep-learning image-recognition research noise

asked Jul 20 '18 at 14:43

Borbag

141
6

3

votes

2 answers

How to measure Entity Ambiguity?

When using/building a system for Entity Linking, is there a well-known measure for "ambiguity degree" of an entity? Some approach to compare named entities regarding how difficult to disambiguate?

nlp text-mining metric named-entity-recognition research

asked Mar 20 '18 at 20:15

Abdulrahman Bres

221
2
15

Questions tagged [research]