Highest Voted 'audio-recognition' Questions - Data Science Stack Exchange

12

votes

2 answers

Deep Learning with Spectrograms for sound recognition

I was looking into the possibility to classify sound (for example sounds of animals) using spectrograms. The idea is to use a deep convolutional neural networks to recognize segments in the spectrogram and output one (or many) class labels. This is…

deep-learning multilabel-classification audio-recognition

asked Jan 29 '16 at 15:39

user667804

271
3
6

9

votes

1 answer

How does a convolutional ply differ from an ordinary convolutional network?

I am currently working on recreating the results of this paper. In the paper they describe a method for using CNN for features extraction, and have a acoustic model that is Dnn-hmm and pretrained using RBM. Section III subsection A states…

neural-network convolutional-neural-network feature-extraction audio-recognition

asked Feb 17 '17 at 12:30

Carlton Banks

619
1
6
26

9

votes

1 answer

Neural network with flexible number of inputs?

Is it possible to create a neural network which provides a consistent output given that the input can be in different length vectors? I am currently in a situation where I have sampled a lot of audio files, which are of different length, and have to…

neural-network regression tensorflow supervised-learning audio-recognition

asked Oct 20 '16 at 18:46

Carlton Banks

619
1
6
26

8

votes

2 answers

Audio Analysis : Segment audio based on speaker recognition

I have audio clips of people being interviewed and am trying to split the audio clips using python such that all speech segments of the interviewee are outputted in one audio file (eg .wav format) & that of the interviewer in another audio file.…

python data-cleaning audio-recognition

asked Jun 18 '18 at 00:50

aamir23

181
1
4

8

votes

1 answer

CNN for phoneme recognition

I am currently studying this paper, in which CNN is applied for phoneme recognition using visual representation of log mel filter banks, and limited weight sharing scheme. The visualisation of log mel filter banks is a way representing and…

classification keras training convolutional-neural-network audio-recognition

asked Apr 29 '17 at 01:58

Carlton Banks

619
1
6
26

6

votes

3 answers

Tool for labeling audio

I have few thousand audio signals to label into 2 different classes and save them to numpy array for further training of models. MATLAB recently released Signal Labeler for their Signal Analyzer, that could help to label time series, but for certain…

time-series labels audio-recognition

asked Jul 12 '19 at 10:23

Alexey Abramov

83
1
5

6

votes

1 answer

Detecting voice in a noisy environment

I'm trying to detect voice in a noisy environment. This is a New Zealand suburban/bush soundscape. We're using low quality microphones (but I don't think this has an effect on my problem) to periodically record the soundscape in an effort to record…

python audio-recognition

asked Feb 03 '18 at 08:47

Finn Maunsell

73
6

6

votes

2 answers

Optimizing CNN network

I am currently trying to recreate the result of this paper, in which they do feature extraction from a "spectogram" of log-melfilter energies.. Since the paper doesn't state what kind of feature I am seeking, i am currently trying to extract…

python keras regression convolutional-neural-network audio-recognition

asked Mar 16 '17 at 02:52

Carlton Banks

619
1
6
26

5

votes

2 answers

Keyword localization in audio file

I want to build a model that can localize occurrences of a particular word in an audio file. For example, I want to find the word "pizza" in a ~5min recording. The program should return an array with (start, stop) objects describing the start and…

cnn object-detection audio-recognition

asked Jan 30 '20 at 00:27

xana

161
5

5

votes

2 answers

How can you efficiently cluster speech segments by speaker?

We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our target speaker, so basically figure out which…

clustering pca embeddings audio-recognition voice

asked Mar 11 '25 at 17:54

Yes

181
4

4

votes

1 answer

python - What is the format of the WAV file for a Text to Speech Neural Network?

I am creating a Text to Speech system for a phonetic language called "Kannada" and I plan to train it with a Neural Network. The input is a word/phrase while the output is the corresponding audio. While implementing the Network, I was thinking the…

python neural-network audio-recognition

asked Mar 27 '17 at 19:53

Ajay H

232
1
3
10

4

votes

1 answer

Training a CNN with limited weight sharing

I am currently working with speech recognition, in which i would like to try to use CNN instead of the normal feature extraction step. I been reading this paper which proposes method using cnn. The input is a visual representation of mel-log filter…

python keras convolutional-neural-network audio-recognition

asked Mar 01 '17 at 16:30

Carlton Banks

619
1
6
26

4

votes

2 answers

How to reduce dimensionality of audio data that comes in form of matrices and vectors?

I'm working on a project involved with identifying different types of sounds (such as screams, singing, and bangs) from each other. We've got our data a reasonable number of different transformations (e.g.: spectrograms, chromagrams, MFCCs, etc.),…

python scikit-learn dimensionality-reduction feature-engineering audio-recognition

asked Mar 14 '16 at 00:37

Ben Sandeen

141
3

3

votes

1 answer

How to double audio dataset?

I am trying to develop a mispronunciation detection model for English speech. I use TIMIT dataset, this is phoneme labeled audio dataset. A phoneme is any of the perceptually distinct units of sound. So, my dataset looks like an audio file and…

neural-network dataset machine-learning-model audio-recognition speech-to-text

asked May 20 '21 at 08:29

Abylay Omar

31
2

3

votes

1 answer

What's the best way to validate a rare event detection model during training?

When training a deep model for rare event detection (e.g. sound of an alarm in a home device audio stream), is it best to use a balanced validation set (50% alarm, 50% normal) to determine early stopping etc., or a validation set representative of…

deep-learning anomaly-detection class-imbalance audio-recognition

asked May 31 '20 at 15:14

jack

71
3

Questions tagged [audio-recognition]