For detecting the make and model of cars from images with high accuracy across a large number of classes, I would recommend a convolutional neural network (CNN) architecture tailored for fine-grained image recognition.
Some key elements to consider:
Use a pre-trained model like ResNet50 or VGG16 pre-trained on ImageNet as the base model. This initializes the model with learned feature maps helpful for image recognition tasks.
Add custom classification layers on top to classify the 50 makes x 50 models = 2,500 classes. These layers should have enough capacity to differentiate the fine-grained details.
Use a dataset of car images labelled with make/model for training. Consider augmenting it with rotations, crops, lighting changes etc to increase diversity.
Fine-tune just the classifier portion first. Then fine-tune some of the latter convolutional blocks along with the classifier for better specialization.
Use softmax loss or adaptations like label smoothing regularization that can help for many fine-grained classes.
Overall accuracy rates of over 90% should be possible for 2500 classes with a tuned CNN approach. YOLO is more suited for localization/bounding box detection - useful if you want to detect the boundary of cars too. But for pure make/model classification, a regular CNN would be more accurate.
There are some good open-source datasets for training a model to classify car make and model:
Stanford Cars Dataset: Contains 16,185 images of 196 classes of cars. It has less diversity but high image quality and standardized cropping.
CompCars Dataset: It contains data for both car classification and fine-grained recognition tasks. For classification, it has 431 car makes with 34,991 images. For fine-grained recognition, it has 13,858 images of 171 car models from the 431 makes.
VehicleID Dataset: A large dataset with 221,763 images of 26,267 vehicles captured by surveillance cameras. It has less standardized images but huge diversity.
Car Make and Model Recognition Dataset: Created from Google image search results, it contains 63,000 low-resolution images across 183 car makes and models. Helpful for evaluating real-world performance.
Some pre-processing is needed for alignment, cropping, colour correction etc. For 50 make x 50 models = 2500 classes, you may need to consolidate some granular classes from these datasets and augment more data for less represented classes.
Start with CompCars or VehicleID dataset as they are larger and closer to your needs. Fine-tune on the Stanford Cars dataset for better performance.
I hope it helps!