Questions tagged [audio-recognition]

114 questions
12
votes
2 answers

Deep Learning with Spectrograms for sound recognition

I was looking into the possibility to classify sound (for example sounds of animals) using spectrograms. The idea is to use a deep convolutional neural networks to recognize segments in the spectrogram and output one (or many) class labels. This is…
9
votes
1 answer

How does a convolutional ply differ from an ordinary convolutional network?

I am currently working on recreating the results of this paper. In the paper they describe a method for using CNN for features extraction, and have a acoustic model that is Dnn-hmm and pretrained using RBM. Section III subsection A states…
9
votes
1 answer

Neural network with flexible number of inputs?

Is it possible to create a neural network which provides a consistent output given that the input can be in different length vectors? I am currently in a situation where I have sampled a lot of audio files, which are of different length, and have to…
8
votes
2 answers

Audio Analysis : Segment audio based on speaker recognition

I have audio clips of people being interviewed and am trying to split the audio clips using python such that all speech segments of the interviewee are outputted in one audio file (eg .wav format) & that of the interviewer in another audio file.…
aamir23
  • 181
  • 1
  • 4
8
votes
1 answer

CNN for phoneme recognition

I am currently studying this paper, in which CNN is applied for phoneme recognition using visual representation of log mel filter banks, and limited weight sharing scheme. The visualisation of log mel filter banks is a way representing and…
6
votes
3 answers

Tool for labeling audio

I have few thousand audio signals to label into 2 different classes and save them to numpy array for further training of models. MATLAB recently released Signal Labeler for their Signal Analyzer, that could help to label time series, but for certain…
6
votes
1 answer

Detecting voice in a noisy environment

I'm trying to detect voice in a noisy environment. This is a New Zealand suburban/bush soundscape. We're using low quality microphones (but I don't think this has an effect on my problem) to periodically record the soundscape in an effort to record…
6
votes
2 answers

Optimizing CNN network

I am currently trying to recreate the result of this paper, in which they do feature extraction from a "spectogram" of log-melfilter energies.. Since the paper doesn't state what kind of feature I am seeking, i am currently trying to extract…
5
votes
2 answers

Keyword localization in audio file

I want to build a model that can localize occurrences of a particular word in an audio file. For example, I want to find the word "pizza" in a ~5min recording. The program should return an array with (start, stop) objects describing the start and…
xana
  • 161
  • 5
5
votes
2 answers

How can you efficiently cluster speech segments by speaker?

We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our target speaker, so basically figure out which…
Yes
  • 181
  • 4
4
votes
1 answer

python - What is the format of the WAV file for a Text to Speech Neural Network?

I am creating a Text to Speech system for a phonetic language called "Kannada" and I plan to train it with a Neural Network. The input is a word/phrase while the output is the corresponding audio. While implementing the Network, I was thinking the…
Ajay H
  • 232
  • 1
  • 3
  • 10
4
votes
1 answer

Training a CNN with limited weight sharing

I am currently working with speech recognition, in which i would like to try to use CNN instead of the normal feature extraction step. I been reading this paper which proposes method using cnn. The input is a visual representation of mel-log filter…
4
votes
2 answers

How to reduce dimensionality of audio data that comes in form of matrices and vectors?

I'm working on a project involved with identifying different types of sounds (such as screams, singing, and bangs) from each other. We've got our data a reasonable number of different transformations (e.g.: spectrograms, chromagrams, MFCCs, etc.),…
3
votes
1 answer

How to double audio dataset?

I am trying to develop a mispronunciation detection model for English speech. I use TIMIT dataset, this is phoneme labeled audio dataset. A phoneme is any of the perceptually distinct units of sound. So, my dataset looks like an audio file and…
3
votes
1 answer

What's the best way to validate a rare event detection model during training?

When training a deep model for rare event detection (e.g. sound of an alarm in a home device audio stream), is it best to use a balanced validation set (50% alarm, 50% normal) to determine early stopping etc., or a validation set representative of…
1
2 3 4 5 6 7 8