A form of signal processing where the input is an image. Usually treating the digital image as a two-dimensional signal (or multidimensional). This processing may include image restoration and enhancement (in particular, pattern recognition and projection).
Questions tagged [image-recognition]
250 questions
25
votes
5 answers
Convolutional neural network overfitting. Dropout not helping
I am playing a little with convnets. Specifically, I am using the kaggle cats-vs-dogs dataset which consists on 25000 images labeled as either cat or dog (12500 each).
I've managed to achieve around 85% classification accuracy on my test set,…
Juan Antonio Gomez Moriano
- 1,227
- 1
- 10
- 17
23
votes
8 answers
Rationale behind most published works in medical imaging trying to reduce false positives
In medical image processing, most of the published works try to reduce false positive rate (FPR) while in reality, false negatives are more dangerous than false positives. What is the rationale behind it?
Hunar
- 1,197
- 2
- 11
- 33
21
votes
3 answers
Feature extraction of images in Python
In my class I have to create an application using two classifiers to decide whether an object in an image is an example of phylum porifera (seasponge) or some other object.
However, I am completely lost when it comes to feature extraction techniques…
Jeremy Barnes
- 315
- 1
- 3
- 8
21
votes
3 answers
Image resizing and padding for CNN
I want to train a CNN for image recognition. Images for training have not fixed size. I want the input size for the CNN to be 50x100 (height x width), for example. When I resize some small sized images (for example 32x32) to input size, the content…
Odgiiv
- 333
- 1
- 2
- 7
18
votes
2 answers
How many images per class are sufficient for training a CNN
I'm starting a project where the task is to identify sneaker types from images. I'm currently reading into TensorFlow and Torch implementations. My question is: how many images per class are required to reach a reasonable classification…
Feynman27
- 301
- 1
- 2
- 5
13
votes
1 answer
Reason for square images in deep learning
Most of the advanced deep learning models like VGG, ResNet, etc. require square images as input, usually with a pixel size of $224x224$.
Is there a reason why the input has to be of equal shape, or can I build a convnet model with say $100x200$ as…
spore234
- 613
- 8
- 14
10
votes
2 answers
How does the bounding box regressor work in Fast R-CNN?
In the fast R-CNN paper (https://arxiv.org/abs/1504.08083) by Ross Girshick, the bounding box parameters are continuous variables. These values are predicted using regression method. Unlike other neural network outputs, these values do not represent…
Saptarshi Roy
- 439
- 2
- 4
- 11
10
votes
2 answers
How can I find out what class each of the columns in the probabilities output correspond to using Keras for a multi-class classification problem?
I'm using transfer learning to build an image recognition model using a pre-trained VGG network in Keras and excluding the final fully-connected layer to get the output weights. I'm then using these output weights to feed into my new model which…
Ryan Chase
- 469
- 2
- 4
- 14
10
votes
5 answers
Dimension-Hopping in Machine Learning
What is the dimension hopping problem in machine learning (occurring in convolutional neural networks and image recognition)? I have googled about it but all I get is information on the Physics of material shape deformation. It will be more helpful…
Saurabh Jain
- 213
- 2
- 7
9
votes
1 answer
How can you build a model that extracts data out from receipts?
I'm trying to build a model that is capable of identifying information on receipts and invoices.
I have used google cloud vision api for text extraction from the receipt but the problem is it just returns all the text from a receipt. I am looking to…
user_12
- 347
- 3
- 10
9
votes
1 answer
How does YOLO algorithm detect objects if the grid size is way smaller than the object in the test image?
In YOLO algorithm how do these grids output a prediction if some grids only see a small black portion of the car if the model was trained on datasets with full images?
Rishi Swethan
- 101
- 1
- 2
9
votes
2 answers
Ways to reconstruct shuffled pixels of a video file?
Suppose that you have a video file which pixel order has been shuffled once. That is, a random order have been defined once and applied to all frames.
Does it exist some known approach for retrieving the initial order of pixels?
I have some ideas…
Denis Dollfus
- 93
- 3
9
votes
1 answer
What is the distribution of categories in imagenet training set (ILSVRC2012)
http://arxiv.org/pdf/1409.0575v3.pdf Table 2
says there are 1,281,167 images and 732-1300 per class in the ILSVRC2012 training set.
Ideally I'd like to avoid downloading the 138 GB just for this purpose as I otherwise don't need it.
I was wondering…
user1030139
- 91
- 1
- 4
8
votes
1 answer
How can you build a model that reads out receipts and invoices?
The objective is to build a model that is capable of identifying information on receipts and invoices that can look completely different.
I've had a discussion with my brother about the right approach. I have attached an example, here the original…
Spurious
- 181
- 1
- 3
8
votes
2 answers
Can I get numeric data from a color map?
In my class I often need to work with color map images. I would show the image and try to make inferences/observations about different subjects. Often times I need to actually quantify some aspects, but it is always very approximate and somehow…
terauser
- 111
- 1
- 3