Questions tagged [bootstraping]

20 questions
7
votes
1 answer

Nested-cross validation pipeline and confidence intervals

I'm hoping someone can help me think through this. I've come across a lot of different resources on nested-cv, but I think I'm confused as to how to go about model selection and the appropriate construction of confidence intervals for the training…
4
votes
3 answers

List of samples that each tree in a random forest is trained on in Scikit-Learn

In Scikit-learn's random forest, you can set bootstrap=True and each tree would select a subset of samples to train on. Is there a way to see which samples are used in each tree? I went through the documentation about the tree estimators and all the…
3
votes
0 answers

Difference Bagging and Bootstrap aggregating

Bootstrap belongs to Efron. Tibshirani wrote a book about that in reference to Efron. Bootstrap process for estimating the standard error of statistic s(x). B bootstrap sample are generatied from original data. Finally the standard deviation of the…
martin
  • 329
  • 3
  • 14
3
votes
2 answers

How bootstrapping works for prediction intervals?

I'm experimenting with prediction interval (PI) over univariant time-data using skforecast pythonic package.. in the documentation it is mentioned that: Prediction intervals A prediction interval defines the interval within which the true value of…
Mario
  • 571
  • 1
  • 6
  • 24
2
votes
2 answers

Resampling train and test data in R

I need to try out few different machine learning methods (SVM, Logistic regression etc.), predict a value either true or false, and write down their AUC and Accuracy of these predictions. I have allready successfully done that, now i have a two…
znoris007
  • 21
  • 1
2
votes
1 answer

Question on bootstrap sampling

I have a corpus of manually annotated (aka "gold standard) documents and a collection of NLP systems annotations on the text from the corpus. I want to do a bootstrap sampling of the system and gold standard to approximate a mean and standard error…
horcle_buzz
  • 201
  • 1
  • 6
1
vote
0 answers

How are the same observation sets treated in Random Forests with Bootstrapping?

Let's assume an extremely small dataset with only 4 observations. And I create a Random Forest model, with a quite large number of trees, say 200. If so, some sample sets that are the same each other can be used in fitting, right? Is it OK? Even…
jlee
  • 11
  • 1
1
vote
1 answer

nnet in caret. Bootstrapping or cross-validation?

I want to train shallow neural network with one hidden layer using nnet in caret. In trainControl, I used method = "cv" to perform 3-fold cross-validation. The snipped the code and results summary are below. myControl <- trainControl(## 3-fold CV …
1
vote
1 answer

About confidence/prediction intervals: parametric methods VS non-parametric (via bootstrap) methods

About the methodology to find confidence and/or prediction intervals in, let's say, a regression problem, I know 2 main options: Checking normality in the estimates/predictions distribution, and applying well known Gaussian alike methods to find…
German C M
  • 2,744
  • 7
  • 18
1
vote
0 answers

Block bootstrapping a portfolio of stock indices with different inception dates

Imagine a data frame with multiple columns, where each column is a time series holding daily returns for an individual stock index. Additionally, the data frame holds a date column. I want to do block bootstrapping to create synthetic data. Each…
Andi
  • 111
  • 1
1
vote
1 answer

Model evaluation approach allowing manual experimentation without data leakage

In supervised machine learning, are there any evaluation approaches beside using a fixed holdout test dataset, which allow me as a scientist to manually compare preprocessing approaches, without leaking information from the test dataset. For…
thomas8wp
  • 111
  • 1
1
vote
0 answers

What is the best way to combine cross-validation and bootstrapping for one application?

We intend to model data with non-parametric covariate splines and we would like to understand the uncertainty of the parameter estimates/response estimates. Currently, we use cross-validation to model the optimal smoothness of our spline models…
1
vote
0 answers

How to perform bootstrap validation on CART decision tree?

I have a relatively small dataset n = 500 for which I am training a CART decision tree. My dataset has about 30 variables and the outcome has 3 classes. I am using CART for interpretability purposes, as what I am interested in, is sharing and…
Eric Yamga
  • 11
  • 2
1
vote
0 answers

Evaluate Dendrogram Statistical Significance

I have N=21 objects and each one has about 80 possible not NaN descriptors. I carried out a hierarchical clustering on the objects and I obtained this dendrogram. I want some kind of 'confidence' index for the dendrogram or for each node. I saw…
Mirko
  • 111
  • 5
1
vote
0 answers

Stratified sampling - use of proxy variable

For splitting of the data into train/test/val I use stratified sampling. Is it appropriate to define strata using information extracted from the dataset? E.g. use machine-learning to model proxy variable used for the strata definition? My worry is…
holoubekm
  • 11
  • 1
1
2