Questions tagged [numerical]

32 questions
48
votes
6 answers

Encoding features like month and hour as categorial or numeric?

Is it better to encode features like month and hour as factor or numeric in a machine learning model? On the one hand, I feel numeric encoding might be reasonable, because time is a forward progressing process (the fifth month is followed by the…
14
votes
4 answers

How can I dynamically distinguish between categorical data and numerical data?

I know someone who is working on a project that involves ingesting files of data without regard to the columns or data types. The task is to take a file with any number of columns and various data types and output summary statistics on the numerical…
Poisson Fish
  • 243
  • 3
  • 6
6
votes
1 answer

How to estimate the mutual information numerically?

Suppose I have a sample {$z_i$}$_{i\in[0,N]}$ = {($x_i,y_i$)}$_{i\in[0,N]}$ which commes from a probability distribution $p_z(z)$. How can I use it to estimate the mutual information between X and Y ? $MI(X,Y) = \int_Y \int_X …
5
votes
2 answers

What is the intuition behind using Monte Carlo to solve a differential equation

Conceptually, I understand how a numerical method like Monte Carlo is used to solve a definite integral. Because integral of a function is the area bounded by the curve, the ratio of random points that land inside the curve to the total number of…
Victor
  • 651
  • 3
  • 8
  • 20
4
votes
3 answers

Purpose of converting continuous data to categorical data

I was reading through a notebook tutorial working with the Titanic dataset, linked here, and noticed that they highly favored ordinal data to continuous data. For example, they converted both the Age and Fare features into ordinal data bins. I…
4
votes
1 answer

Do numerical inaccuracies play any role in training neural networks?

Are there publications which mention numerical problems in neural network optimization? (Blog posts, articles, workshop notes, lecture notes, books - anything?) Background of the question I've recently had a strange phenomenon: When I trained a…
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170
3
votes
1 answer

Steps in exploratory methods for mild-sized data with mixed categorical and numerical values?

Experienced in signal/image analysis, and new to data science, I recently was challenged with a relatively simple dataset: 100 to 200 items, about 10-20 numerical variables (in the [0-1] or percentage range), with only one variable used at present…
3
votes
1 answer

How Should I Handle Ordered Features with a Censored Outcome Variable?

I have a dataset with many ordered features, most of which have 3 levels (e.g., 0, 1, 2), and my outcome variable is censored. I’m debating whether to treat these ordinal features as numeric or categorical. If I treat them as categorical, I’m…
2
votes
1 answer

5 digit number mis-reads analysis

Nothing to do with number recognition in the classical 'hand-written' sense Disclaimer above to avoid this being counted as a repeat. I have a selection of 96 serial numbers, and a separate selection of >220 serial numbers. Within the larger set…
HDSci
  • 131
  • 5
2
votes
1 answer

MinMaxScaler returned values greater than one

Basically I was looking for a normalization function part of sklearn, which is useful later for logistic regression. Since I have negative values, I chose MinMaxScaler with: feature_range=(0, 1) as a parameter. x = MinMaxScaler(feature_range=(0,…
2
votes
1 answer

Replacing words by numbers in multiple columns of a data frame in R

I want to replace the values in a data set (sample in the picture) using numbers instead of words, e.g., 1 instead of D, -1 instead of R, 0 for all other values. How can I do it with a loop? I know it can be done doing this instead: (suppose d is…
Yellow whale
  • 21
  • 1
  • 1
  • 3
2
votes
0 answers

How to choose the optimal k in k-protoypes?

To analyze a dataset from banking I have both numerical and categorical values. I transform them to analyze with k-prototypes. The original dataset: The modified dataset: E.g.: Job (for 1 to 12 'cos there are 12 levels) Should I scale the…
Sergi F.
  • 21
  • 3
2
votes
0 answers

Separate discrete and continuous variables

I know how to separate numerical and categorical data as follows: num_data = [cname for cname in df.columns if df[cname].dtypes == 'object'] cat_data = [cname for cname in df.columns if df[cname].dtypes in ['int64', 'float64']] Now I want to…
spectre
  • 2,223
  • 2
  • 14
  • 37
1
vote
1 answer

partial numerical array - pattern matching

I have a linear numerical array source and I want to find/match test array as pattern : source = [39,36,23,21,28,36,30,22,34,37] test = [36,23,21,28] we can use brute force or similar method for finding the exact match, by checking test array from…
Mironline
  • 111
  • 2
1
vote
0 answers

How to do non arithmetic operation in python 3

lets say a=3 b=4 and c is an unknown constant. a=3 b=4 F=0 F=a*b*c print(F) Its an error. I want 12*c or 12c
1
2 3