1

I have collected data for a PhD thesis, and need help understanding how to build a road map to do analytical and statistical analysis. The PhD is not itself in statistics or machine learning, but I would like to understand what are the steps and type of analysis that I have to follow for analysing data for an advanced degree? In general, how should I approach such a problem?

In the data I have collected, there are 623 observations including one continuous dependent variable and 13 independent variables (continuous, categorical, and ordinal) that are defined based on the researcher experience and literature review.

I considered planning to do several regression analysis to predict the dependent variable and study the effective factors (if they are positive, negative, and their magnitude) on it. I've tried multiple linear regression including different transformation on independent variables. On the other hand, I'm not sure if I should study each independent variables through the time and forecast their values in the time horizon?

Here are the steps in my mind so far:

  1. Plotting the scatter plots of different independent variables vs dependent variable to define outliers and check if the model is linear also with respect to coefficients

  2. Removing the potential outliers

  3. Splitting the data into two data sets to build the model and validate it after that.

    If the model is linear then:

  4. Performing the multiple linear regression

  5. Performing the multiple linear regression including different transformations to enhance the model

  6. Validating the model

  7. Doing the quantile regression

  8. Doing supervised learning machine etc.

If the model is not linear, I may instead need to use non-linear statistical techniques.

Any feedback would be highly appreciated. My goal is to build a clear and robust road map for this part of the work.

Amir
  • 137
  • 9

1 Answers1

1

Typically, quantitative analysis is planned and performed, based on research study's goals. Focusing on research goals and corresponding research questions, researcher would propose a model (or several models) and a set of hypotheses, associated with the model(s). Model(s) and its/their elements' types usually dictate (suggest) quantitative approaches that would make sense in a particular situation. For example, if your model includes latent variables, you would have to use appropriate methods to perform data analysis (i.e., structural equation modeling). Otherwise, you can apply a variety of other methods, such as time series analysis or, as you mentioned, multiple regression and machine learning. For more details on research workflow with latent variables, also see section #3 in my relevant answer.

One last note: whatever methods you use, pay enough attention to the following two very important aspects - performing full-scale exploratory data analysis (EDA) (see my relevant answer) and trying to design and perform your analysis in the reproducible research fashion (see my relevant answer).

Aleksandr Blekh
  • 6,603
  • 4
  • 29
  • 55