--

The research project focusses on how sound data can be converted into understandable and actionable information by humans and machines. It started on 14 March 2016 and will run until 13 March 2019. The project is funded by the Engineering and Physical Sciences Research Council (EPSRC) with a funding value of £1,275,401. This is a joint project between the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, and the Acoustics Research Centre at the University of Salford.

A project overview can be found here.

DCASE 2017 challenge success

Yong Xu, Qiuqiang Kong, Wenwu Wang and Mark Plumbley won the 1st prize in Task 4, ‘large-scale weakly supervised sound event detection for smart cars’, Subtask A, ‘audio tagging’ in the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2017). The DCASE challenge constitutes the most important challenge in the non-speech audio domain. It is organised by Tampere University of Technology, Carnegie Mellon University and INRIA and sponsored by Google and Audio Analytic. Because of its unique standing, the best players in the field participate such as CMU, New York University, Bosch, USC, TUT, Singapore A*Star, Korean Advanced Institute of Science and Technology, Seoul National University, National Taiwan University and CVSSP.

More...

New publication: 'Masked Non-negative Matrix Factorization for Bird Detection Using Weakly Labelled Data'

Sobieraj, Iwona, Kong, Qiuqiang and Plumbley, Mark (2017) Masked Non-negative Matrix Factorization for Bird Detection Using Weakly Labelled Data. In: 25th European Signal Processing Conference, 2017 (EUSIPCO-2017), 28 Aug- 2 Sep 2017, Kos Island, Greece.

More...

New publication: 'Joint Detection and Classification Convolutional Neural Network on Weakly Labelled Bird Audio Detection'

Kong, Qiuqiang, Xu, Yong and Plumbley, Mark (2017) Joint Detection and Classification Convolutional Neural Network on Weakly Labelled Bird Audio Detection. In: 25th European Signal Processing Conference (EUSIPCO) 2017, Aug 28 - Sep 2 2017, Kos Island, Greece.

More...

New publication: 'Using deep neural networks to estimate tongue movements from speech face motion'

Kroos, Christian, Bundgaard-Nielsen, RL, Best, CT and Plumbley, Mark (2017) Using deep neural networks to estimate tongue movements from speech face motion. In: 14th International Conference on Auditory-Visual Speech Processing (AVSP2017), 25 - 26 August 2017, Stockholm, Sweden.

More...

New publication: 'Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks'

Kroos, Christian and Plumbley, Mark (2017) Learning the mapping function from voltage amplitudes to sensor positions in 3D-EMA using deep neural networks. In: Interspeech 2017, 20 - 24 August 2017, Stockholm, Sweden.

More...

New publication: 'Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging'

Xu, Yong, Kong, Qiuqiang, Huang, Qiang, Wang, Wenwu and Plumbley, Mark (2017) Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging. In: Interspeech 2017, 20 - 24 August 2017, Stockholm, Sweden.

More...

'Immersions' exhibition

Three members of the Making Sense of Sounds project (David Frohlich, Philip Jackson and Christian Kroos) participate in an art exhibition exploring the topic ‘water’ with 2D and 3D photography, spatial audio, video and installation art.

More...

New publication: 'Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging'

Xu, Y, Kong, Q, Huang, Q, Wang, W and Plumbley, MD (2017) Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging. In: The 2017 International Joint Conference on Neural Networks (IJCNN 2017), Anchorage, Alaska.

More...

New publication: 'Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging'

Xu, Y, Huang, Q, Wang, W, Foster, P, Sigtia, S, Jackson, PJB and Plumbley, MD (2017) Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging IEEE/ACM Transactions on Audio, Speech, and Language Processing.

More...

New publication: 'Coupled Sparse NMF vs. Random Forest Classification for Real Life Acoustic Event Detection'

Sobieraj, Iwona and Plumbley, Mark (2016) Coupled Sparse NMF vs. Random Forest Classification for Real Life Acoustic Event Detection. In: Detection and Classification of Acoustic Scenes and Events 2016, 3 Sept 2016, Budapest, Hungary.

More...

New publication: 'Hierarchical Learning for DNN-Based Acoustic Scene Classification'

Xu, Y, Huang, Q, Wang, W and Plumbley, MD (2016) Hierarchical Learning for DNN-Based Acoustic Scene Classification. In: DCASE2016 Workshop (Workshop on Detection and Classification of Acoustic Scenes and Events), Budapest, Hungary.

More...

New publication: 'Fully DNN-based Multi-label regression for audio tagging'

Xu, Y, Huang, Q, Wang, W, Jackson, PJB and Plumbley, MD (2016) Fully DNN-based Multi-label regression for audio tagging. In: DCASE2016 Workshop (Workshop on Detection and Classification of Acoustic Scenes and Events), Budapest, Hungary.

More...

New publication: 'Fast Tagging of Natural Sounds Using Marginal Co-regularization'

Huang, Qiang, Yong Xu, P. J. B. Jackson, Wenwu Wang, and Mark D. Plumbley. “Fast Tagging of Natural Sounds Using Marginal Co-regularization.” Proceedings of ICASSP2017 (2017).

More...

DCASE 2016 Challenge: Random system performance in sound event detection in real life audio

In this report we describe the creation of a random, data-blind system to provide a random baseline for Task 3 (sound event detection in real life audio) in the DCASE 2016 challenge. Particular attention is paid to the results of two sound events occurring in the residential area scene, one very rare, the other very frequent. The relatively good performance of the random system in comparison to the results of the proper detection systems shows the difficulty of Task 3 given the current state-of-the-art sound detection methods.

More...

In the media: MIT News - 'Computer learns to recognize sounds by watching video'

Mark Plumbley appeared in an article about machine-learning by Larry Hardesty for MIT News.

More...

In the media: BBC Radio 3 - 'The Verb'

Trevor Cox appeared in ‘The Verb’ of BBC Radio 3.

More...

New publication: 'A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data'

Kong, Qiuqiang, Yong Xu, Wenwu Wang, and Mark Plumbley. “A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data.” arXiv preprint arXiv:1610.01797 (2016).

More...

New publication: 'Deep neural network baseline for DCASE challenge 2016'

Kong, Qiuqiang, Iwnoa Sobieraj, Wenwu Wang, and Mark Plumbley (2016). “Deep neural network baseline for DCASE challenge 2016.”, DCASE 2016 Workshop, Budapest.

More...

In the media: New Scientist - 'Binge-watching videos teaches computers to recognise sounds'

Mark Plumbley was interviewed by Aviva Hope Rutkin for New Scientist.

More...