I have been using different machine learning algorithms throughout various projects at university, and attended some inspirational lectures where industrial companies show and present how they use machine learning, data mining, etc. in their work. I myself mostly use Python, and have previously used libraries such as sklearn. My problem is, that I have a huge difficulty understanding the role of built in algorithms vs making them completely from scratch with pure coding and math - i.e. using theoretical machine learning tools to actually do the work yourself. I understand doing everything yourself can be constrained by time/money/resources. Also sometimes it doesn't make sense to reinvent stuff that has been vastly optimized by others before you.
I keep feeling that using sklearn's built in random forest classifier or using xgboost in python is kind of cheating. I am only preparing the data, cleaning it to get the right formats, maybe do some feature engineering og initial plottings and statistical analysis. The problem is that when all that is done, we simply feed the data to this pre-made algorithm and it does everything behind the scene, and just spits out predictions. I feel that i am not doing anything, and not using all the knowledge i have learned in the data exploration analysis. Neither am i using any of the patterns that i found in the data. Still i hear from big companies that they use xgboost and sklearn - and I can see it actively being used in Kaggle competitions.
Almost every website i find only provides examples using these built in libraries, and do not go through any deeper math or statistics at all. I really enjoy working with machine learning - but i have this strong feeling that im completely missing the "professional" approach of doing things. I know there is a lot of books on theoretical machine learning - but still almost everyone online seems to just use pre-made algorithms. I have been struggling with this understanding for about a year now. The validity of these pre-made algorithms in serious industrial/business/academic use is still not clear to me.
EDIT: To be more specific. My question is: How are these libraries/tools viewed in professional/industrial/academical context in comparison to actually building a model yourself. Are they just a "quick and easy" alternative way to start learning machine learning and data mining for students and amateurs, or are they in fact more powerful (than i at least know) and should not be seen as an alternative, but a viable solution for professionals?
The motivation for my single question above, can be ellaborated by explaining the questions i ask myself. The very questions that began this confusion for me. Is it cheating to use these models? In which situations would you use a pre-built library, and when to avoid it? How do i merge (or use) the knowledge gained from the scientific data analysis i did before modelling, and these pre-built classifiers.