Domenico Stefani research and coding website, Research page.

:::::::::  :::::::::: ::::::::  ::::::::::     :::     :::::::::   ::::::::  :::    :::
:+:    :+: :+:       :+:    :+: :+:          :+: :+:   :+:    :+: :+:    :+: :+:    :+:
+:+    +:+ +:+       +:+        +:+         +:+   +:+  +:+    +:+ +:+        +:+    +:+
+#++:++#:  +#++:++#  +#++:++#++ +#++:++#   +#++:++#++: +#++:++#:  +#+        +#++:++#++
+#+    +#+ +#+              +#+ +#+        +#+     +#+ +#+    +#+ +#+        +#+    +#+
#+#    #+# #+#       #+#    #+# #+#        #+#     #+# #+#    #+# #+#    #+# #+#    #+#
###    ### ########## ########  ########## ###     ### ###    ###  ########  ###    ###

Research Interests ↑ Back to the top

My research is about music AI, sound processing for music, and real-time Music Information Retrieval (MIR) on small resource-constrained devices.
I am interested in both algorithmic and AI approaches to Music processing that can support artistic expression through music, and end up in the hands of musicians.

Some of the cool projects I have worked on recently are:

Improving efficiency of a convolution-based Ambisonics Spatial Audio Plugin (paper)
Cross-circuit neural effect modeling (paper)
An AI agent improvising with a human player in a duet (paper)
Assessing the importance of accurate onset labels for real-time Music Information Retrieval (paper)
Brain-controlled audio effect chains (paper)

For my PhD I focused on smart-musical instrument, which are musical instruments can be designed to recognize certain high-level traits or properties from a music signal, such as the expressive techniques used or the mood of the music.

Upcoming / Accepted

/ In Press

↑ Back to the top

•

2025, In Press ↑ Back to the top

Improved Real-Time Six-degrees-of-freedom Dynamic Auralization Through Non-uniformly Partitioned Convolution link

Domenico Stefani, Marco Binelli, Angelo Farina, Luca Turchet

(In Press) Journal of the Audio Engineering Society (JAES)

Abstract

Recent years have witnessed an increasing interest from the academic and industrial research community towards software tools for dynamic auralization and six-degrees-offreedom (6DoF) navigation of immersive audio environments. Some existing tools rely on the convolution of source sounds with Ambisonics impulse responses (IRs) recorded in real spaces. However, despite the advancements in computing power of modern CPUs, convolution is still a rather demanding computation to perform, especially with many channels and in realtime. Moreover, efficient computation schemes often used in single-IR-matrix tools have not made their way into open-source 6DoF spatial audio plugins. We present MCFX-6DoFconv, an open-source 6DoF convolution plugin combining the efficient convolution engine of the MCFX-Convolver plugin with the 6DoF navigation features of SPARTA 6DoFconv, along with several functional and interface improvements. Compared to the original SPARTA 6DoFconv, the proposed plugin yields a considerable increase in computing efficiency throughout a wide range IR lengths, number of channels, and audio buffer sizes, up to a 3.7-fold improvement. This enables real-time auralization with longer IRs and multiple source rendering with more plugin instances. Moreover, the proposed plugin enables instant listener-position updates, eliminating previous delays of up to two buffer sizes and removing the audio latency caused by internal buffering

BibTeX

                                            @article{stefani2025improved,
                                                author = "Stefani, Domenico and Binelli, Marco and Farina, Angelo and Turchet, Luca",
                                                title = "{Improved Real-Time Six-degrees-of-freedom Dynamic Auralization Through Non-uniformly Partitioned Convolution}",
                                                journal = "(In Press) Journal of the Audio Engineering Society (JAES)",
                                                year = "2025",
                                                note = {In Press},
                                            }

Accepted Preprint GitHub Repository

•

Accepted 2025, September ↑ Back to the top

Morphdrive: Latent Conditioning For Cross-Circuit Effect Modeling And A Parametric Audio Dataset Of Analog Overdrive Pedals link

Francesco Ardan Dal Rì, Domenico Stefani, Luca Turchet, Nicola Conci

Accepted at the 28-th Int. Conf. on Digital Audio Effects (DAFx25)

Abstract

In this paper, we present an approach to the neural modeling of overdrive guitar pedals with conditioning from a cross-circuit and cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial training, resulting in accurate reconstruction performance across different sets of pedals. We then compare three Hyper-Recurrent architectures for processing, including dynamic and static Hyper- RNNs, and a smaller model for real-time processing. Additionally, we present pOD-set, a new open dataset including recordings of 27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording robot.

BibTeX

                                                @inproceedings{dalri2025morphdrive,
                                                    author = "Dal R{\`i}, Francesco A. and Stefani, Domenico and Turchet, Luca and Conci, Nicola",
                                                    title = "{Morphdrive: Latent Conditioning For Cross-Circuit Effect Modeling And A Parametric Audio Dataset Of Analog Overdrive Pedals}",
                                                    booktitle = "(Accepted) 28-th Int. Conf. on Digital Audio Effects (DAFx25)",
                                                    location = "Ancona, Italy",
                                                    year = "2025",
                                                    month = "Sept.",
                                                    publisher = "",
                                                    pages = "Accepted"
                                                }

Accepted Preprint Paper Webpage Dataset GitHub Repository

Publications >> Download publication list as BibTeX << ↑ Back to the top

•

2025, July ↑ Back to the top

Real-Time Playing Technique Recognition Embedded in a Smart Acoustic Guitar link

Domenico Stefani, Luca Turchet

EURASIP Journal on Audio, Speech, and Music Processing

Abstract

The integration of real-time music information retrieval techniques into musical instruments is a crucial step towards smart musical instruments that can reason about the musical context. This paper presents a real-time guitar playing technique recognition system for a smart electro-acoustic guitar. The proposed system comprises a software recognition pipeline running on a Raspberry Pi 4 and is designed to listen to the guitar’s audio signal and classify each note into eight playing techniques, both pitched and percussive. Real-time playing technique information is used in real-time to allow the musician to control wirelessly connected stage equipment during performance. The recognition pipeline includes an onset detector, feature extractors, and a convolutional neural classifier. Four pipeline configurations are proposed, striking different balances between accuracy and sound-to-result latency. Results show how optimal performance improvements occur when latency constraints are increased from 15 to 45 ms, with performance varying between pitched and percussive techniques based on available audio context. Our findings highlight the challenges of generalization across players and instruments, demonstrating that accurate recognition requires substantial datasets and carefully selected cross-validation strategies. The research also reveals how individual player styles significantly impact technique recognition performance.

BibTeX

@article{stefani2025realtime,
    author={Stefani, Domenico
    and Turchet, Luca},
    title={Real-time playing technique recognition embedded in a smart acoustic guitar},
    journal={EURASIP Journal on Audio, Speech, and Music Processing},
    year={2025},
    month={Jul},
    day={17},
    volume={2025},
    number={1},
    pages={28},
    abstract={The integration of real-time music information retrieval techniques into musical instruments is a crucial step towards smart musical instruments that can reason about the musical context. This paper presents a real-time guitar playing technique recognition system for a smart electro-acoustic guitar. The proposed system comprises a software recognition pipeline running on a Raspberry Pi 4 and is designed to listen to the guitar's audio signal and classify each note into eight playing techniques, both pitched and percussive. Real-time playing technique information is used in real-time to allow the musician to control wirelessly-connected stage equipment during performance. The recognition pipeline includes an onset detector, feature extractors, and a convolutional neural classifier. Four pipeline configurations are proposed, striking different balances between accuracy and sound-to-result latency. Results show how optimal performance improvements occur when latency constraints are increased from 15 to 45 ms, with performance varying between pitched and percussive techniques based on available audio context. Our findings highlight the challenges of generalization across players and instruments, demonstrating that accurate recognition requires substantial datasets and carefully selected cross-validation strategies. The research also reveals how individual player styles significantly impact technique recognition performance.},
    issn={1687-4722},
    doi={10.1186/s13636-025-00413-6},
    url={https://doi.org/10.1186/s13636-025-00413-6}
}

Download PDF Open-Access Journal Webpage

•

2025, July-August Issue ↑ Back to the top

A Virtual Reality Interface for the Creation of 3D Spatial Audio Trajectories link

Matteo Tomasetti, Bavo Van Kerrebroeck, Marcelo M. Wanderley, Domenico Stefani, and Luca Turchet

Journal of the Audio Engineering Society

Abstract

This paper presents SonoSpatia, a Virtual Reality (VR) system designed for creating 3D spatial audio trajectories. SonoSpatia leverages the immersive capabilities of VR technology to facilitate an intuitive and expressive approach for composers and sound designers willing to control positioning parameters in 3D space via gesture-based interactions. We conducted a user study with 12 expert composers, sound engineers, and sound designers to assess the ability of the VR interface to enhance the creative process of spatial audio trajectory creation via a more embodied interaction with the spatial audio parameters. To this end, we compared the workflow of SonoSpatia with those of conventional Digital Audio Workstation (DAW) tools, comprising the ControlGRIS and SpatGRIS spatial audio software and the DAW Reaper. Results indicate that SonoSpatia significantly increases user engagement, satisfaction, absorption, and expressiveness over the conventional DAW-based counterpart. Notably, both the VR and DAW-based interfaces showed minimal differences in the interaction patterns, suggesting similar user interactions across dimensions despite the reported advantages of VR. Despite some challenges related to interface complexity and fine-grained control, the system was well-received by participants, suggesting a promising direction for enhancing 3D spatial audio trajectory creation with VR.

BibTeX

@article{tomasetti2025sonospatia,
    author = {Tomasetti, Matteo and Van Kerrebroeck, Bavo and Wanderley, Marcelo M. and Stefani, Domenico and Turchet, Luca},
    title = {{A Virtual Reality Interface for the Creation of 3D Spatial Audio Trajectories}},
    journal = {Journal of the Audio Engineering Society},
    doi = {10.17743/jaes.2022.0215},
    url = {http://dx.doi.org/10.17743/jaes.2022.0215},
    year={2025},
    volume={73},
    issue={7/8},
    pages={481-492},
    month={July}
}

Accepted Preprint Journal Page ResearchGate

•

2024, November ↑ Back to the top

Musician-AI partnership mediated by emotionally-aware smart musical instruments link

Luca Turchet, Domenico Stefani, Johan Pauwels

International Journal of Human-Computer Studies

Abstract

The integration of emotion recognition capabilities within musical instruments can spur the emergence of novel art formats and services for musicians. This paper proposes the concept of emotionally-aware smart musical instruments, a class of musical devices embedding an artificial intelligence agent able to recognize the emotion contained in the musical signal. This spurs the emergence of novel services for musicians. Two prototypes of emotionally-aware smart piano and smart electric guitar were created, which embedded a recognition method for happiness, sadness, relaxation, aggressiveness and combination thereof. A user study, conducted with eleven pianists and eleven electric guitarists, revealed the strengths and limitations of the developed technology. On average musicians appreciated the proposed concept, who found its value in various musical activities. Most of participants tended to justify the system with respect to erroneous or partially erroneous classifications of the emotions they expressed, reporting to understand the reasons why a given output was produced. Some participants even seemed to trust more the system than their own judgments. Conversely, other participants requested to improve the accuracy, reliability and explainability of the system in order to achieve a higher degree of partnership with it. Our results suggest that, while desirable, perfect prediction of the intended emotion is not an absolute requirement for music emotion recognition to be useful in the construction of smart musical instruments.

BibTeX

@article{turchet2024musicianai,
    title = {Musician-AI partnership mediated by emotionally-aware smart musical instruments},
    journal = {International Journal of Human-Computer Studies},
    volume = {191},
    pages = {103340},
    year = {2024},
    issn = {1071-5819},
    doi = {https://doi.org/10.1016/j.ijhcs.2024.103340},
    url = {https://www.sciencedirect.com/science/article/pii/S107158192400123X},
    author = {Luca Turchet and Domenico Stefani and Johan Pauwels},
    keywords = {Music information retrieval, Music emotion recognition, Smart musical instruments, Transfer learning, Context-aware computing, Trustworthy AI},
}

Download PDF ResearchGate Google_Scholar

•

2024, September ↑ Back to the top

Esteso: Interactive AI Music Duet Based on Player-Idiosyncratic Extended Double Bass Techniques link

Domenico Stefani. Matteo Tomasetti, Filippo Angeloni and Luca Turchet

In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME'24), Utrecht, The Netherlands.

Abstract

Extended playing techniques are a crucial characteristic of contemporary double bass practice. Players find their voice by developing a personal vocabulary of techniques through practice and experimentation. These player-idiosyncratic techniques are used in composition, performance, and improvisation. Today's AI methods offer the opportunity to recognize such techniques and repurpose them in real-time, leading to new forms of interactions between musicians and machines. This paper is the result of a collaboration between a composer/double-bass player and researchers, born from the musician's desire for an interactive improvisational experience with AI centered around the practice of his extended techniques. With this aim, we developed Esteso: an interactive improvisational system based on extended technique recognition, live electronics, and a timbre-transfer double-bass model. We evaluated our system with the musician with three duet improvisational sessions, each using different mapping strategies between the techniques and the sound of the virtual double bass counterpart. We collected qualitative data from the musician to gather insights about the three configurations and the corresponding improvisa-tional duets, as well as investigate the resulting interactions. We provide a discussion about the outcomes of our analysis and draw more general design considerations.

BibTeX

@inproceedings{stefani2024esteso,
    address = {Utrecht, Netherlands},
    articleno = {72},
    author = {Domenico Stefani and Matteo Tomasetti and Filiippo Angeloni and Luca Turchet},
    booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
    doi = {10.5281/zenodo.13904929},
    editor = {S M Astrid Bin and Courtney N. Reed},
    issn = {2220-4806},
    month = {September},
    numpages = {9},
    pages = {490--498},
    presentation-video = {https://youtu.be/mdb2Tlh4ub8?si=0m-6kqA_a_p-c2-z},
    title = {Esteso: Interactive AI Music Duet Based on Player-Idiosyncratic Extended Double Bass Techniques},
    track = {Papers},
    url = {http://nime.org/proceedings/2024/nime2024_72.pdf},
    year = {2024}
}

Download PDF ResearchGate GitHub Repository Slides

•

2024, September ↑ Back to the top

On the Importance of Temporally Precise Onset Annotations for Real-Time Music Information Retrieval: Findings from the AG-PT-set Dataset link

Domenico Stefani, Gregorio A. Giudici, Luca Turchet

In Proceedings of the 19th International Audio Mostly Conference (AM'24)

Abstract

In real-time Music Information Retrieval (MIR), small analysis windows are essential for achieving low retrieval latency. In turn, event-based real-time MIR methods require precise onset detectors to correctly align with the beginning of events such as musical notes. Detectors are typically trained using ground-truth annotations from datasets of interest. Yet, most MIR datasets do not prioritize the accurate timing of onset labels, and the evaluation of detectors often relies on generous tolerance windows (even ±50ms). In this paper we present AG-PT-set, a new dataset of acoustic guitar techniques with precise onset annotations. The dataset features 32,592 individual notes and over 10 hours of audio, covering eight techniques. Moreover, we assess the importance of exact onset labels across multiple real-time MIR tasks. Our results show how accurate timing of onset labels and precise detectors are crucial for real-time MIR tasks, as the performance of most algorithms degrades with imprecise onsets. In few occasions, imprecise onset timing slightly improved results, hinting at a possible similarity to data augmentation methods. Taken together, our findings indicate that temporally precise labels and detectors are always preferable, as robustness can always be obtained via artificial augmentation, while precision cannot be obtained as easily.

BibTeX

@inproceedings{stefani2024importance,
    author = {Stefani, Domenico and Giudici, Gregorio Andrea and Turchet, Luca},
    title = {On the Importance of Temporally Precise Onset Annotations for Real-Time Music Information Retrieval: Findings from the AG-PT-set Dataset},
    year = {2024},
    isbn = {9798400709685},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3678299.3678325},
    doi = {10.1145/3678299.3678325},
    booktitle = {Proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures},
    pages = {270–284},
    numpages = {15},
    keywords = {Audio Processing, Music Information Retrieval, Real-time},
    location = {Milan, Italy},
    series = {AM '24}
}

Download PDF Google_Scholar

•

2024, October ↑ Back to the top

BCHJam: a Brain-Computer Music Interface for Live Music Performance in Shared Mixed Reality Environments link

M. Romani, G.A. Giudici, D. Stefani, D. Zanoni, A. Boem and L.Turchet

in Proceedings of the 5th International Symposium on the Internet of Sounds (IS2), Erlangen, Germany

Abstract

To date, the integration of brain-computer interfaces and mixed reality headsets in Internet of Musical Things (IoMusT) performance ecosystems has received remarkably little attention from the research community. To bridge this gap, in this paper, we present BCHJam: an IoMusT-based performance ecosystem composed of performers, audience members, braincomputer interfaces, smart musical instruments, and mixed reality headsets. In BCHJam, one or more musicians are fitted with a brain-computer music interface (BCMI) giving them the possibility to actively or passively control the processing of their instrument's audio. Moreover, the BCMI's signal controls mixed reality visual effects displayed in XR headsets worn by audience members. All the components of BCHJam communicate through a Wi-Fi network via Open Sound Control messages. We refined the system through a series of test performance sessions, resulting in the creation of a signal quality filter that improved the musician's experience, along with a tuning of control parameters. The developed ecosystem was validated by realizing a musical performance. We provide a critical reflection on the achieved results and discuss the lessons learned while developing this first of its kind IoMusT performance ecosystem.

This paper won the Best Student Paper award

BibTeX

@inproceedings{romani2024bchjam,
    author={Romani, Michele and Giudici, Gregorio Andrea and Stefani, Domenico and Zanoni, Devis and Boem, Alberto and Turchet, Luca},
    booktitle={2024 IEEE 5th International Symposium on the Internet of Sounds (IS2)}, 
    title={{BCHJam:} a Brain-Computer Music Interface for Live Music Performance in Shared Mixed Reality Environments}, 
    year={2024},
    volume={},
    number={},
    pages={1-9},
    keywords={Headphones;Instruments;Ecosystems;Music;Mixed reality;Visual effects;Brain-computer interfaces;Internet;Wireless fidelity;Tuning;Brain-computer interfaces;Mixed Reality;Performance Ecosystem;Internet of Musical Things},
    doi={10.1109/IS262782.2024.10704087}
}

Preprint

•

2024, January ↑ Back to the top

PhD Thesis "Embedded Real-time Deep Learning for a Smart Guitar: A Case Study on Expressive Guitar Technique Recognition" link

Domenico Stefani

Abstract

Smart musical instruments are an emerging class of digital musical instruments designed for music creation in an interconnected Internet of Musical Things scenario. These instruments aim to integrate embedded computation, real-time feature extraction, gesture acquisition, and networked communication technologies. As embedded computers become more capable and new embedded audio platforms are developed, new avenues for real-time embedded gesture acquisition open up. Expressive guitar technique recognition is the task of detecting notes and classifying the playing techniques used by the musician on the instrument. Real-time recognition of expressive guitar techniques in a smart guitar would allow players to control sound synthesis or to wirelessly interact with a wide range of interconnected devices and stage equipment during performance. Despite expressive guitar technique recognition being a well-researched topic in the field of Music Information Retrieval, the creation of a lightweight real-time recognition system that can be deployed on an embedded platform still remains an open problem. In this thesis, expressive guitar technique recognition is investigated by focusing on real-time execution, and the execution of deep learning inference on resource-constrained embedded computers. Initial efforts have focused on clearly defining the challenges of embedded real-time music information retrieval, and on the creation of a first, fully embedded, real-time expressive guitar technique recognition system. The insight gained, led to the refinement of the various steps of the proposed recognition pipeline. As a first refinement step, a novel procedure for the optimization of onset detectors was developed. The proposed procedure adopts an evolutionary algorithm to find parameter configurations that are optimal both in terms of detection accuracy and latency. A subsequent study is devoted to shedding light on the performance of generic deep learning inference engines for embedded real-time audio classification. This consisted of a comparison of four common inferencing libraries, which focus on the applicability of each library to real-time audio inference, and their performance in terms of execution time and several additional metrics. Different insights from these studies supported the development of a new expressive guitar technique classifier, which is accompanied by an in-depth analysis of different aspects of the recognition problem. Finally, the experience collected during these studies culminated in the definition of a procedure to deploy deep learning inference to a prominent embedded platform. These investigations have been shown to improve the state-of-the-art by proposing approaches that surpass previous alternatives and providing new knowledge on problems and tools that can aid the creation of a smart guitar. The new knowledge provided was also adopted for embedded audio tasks that differ from real-time expressive guitar technique recognition.

BibTeX

@phdthesis{stefani2024thesis,
    author = {Stefani, Domenico},
    number = {2024},
    school = {University of Trento},
    title = {Embedded Real-time Deep Learning for a Smart Guitar: A Case Study on Expressive Guitar Technique Recognition},
    year = {2024},
    doi = {10.15168/11572_399995},
}

Download PDF

•

2023, October ↑ Back to the top

Real-Time Embedded Deep Learning on Elk Audio OS link

Domenico Stefani and Luca Turchet

in Proceedings of the 4th International Symposium on the Internet of Sounds (IS²), Pisa, Italy

Abstract

Recent years have witnessed significant advancements in deep learning architectures for music, along with the availability of more powerful embedded computing platforms specific to low-latency audio processing tasks. These recent developments have opened promising avenues for new Smart Musical Instruments and audio devices that rely on the execution of deep learning models on small embedded computers. Despite these new opportunities, there is a lack of instructions on how to deploy neural networks to many promising embedded audio platforms, including the embedded real-time Elk Audio OS. In this paper, we introduce a procedure for deploying audio deep learning models on embedded systems utilizing the Elk Audio OS. The procedure covers the entire process, from creating a compatible code project to executing and diagnosing it on a Raspberry Pi. Moreover, we discuss different approaches for the real-time execution of deep learning inference on embedded devices and provide alternatives for handling larger neural network models. To facilitate implementation and support future updates, we provide an online repository with a detailed guide, code templates, functional examples, and precompiled library binaries for the TensorFlow Lite and ONNX Runtime inference engines. This work aims to bridge the gap between deep learning model development and real-world deployment on embedded systems, fostering the development of self-contained digital musical instruments and other audio devices equipped with real-time deep learning capabilities. By promoting the deployment of neural networks to embedded devices, we contribute to the development of Smart Musical Instruments that are capable of providing musicians and audiences with unprecedented services.

BibTeX

@inproceedings{stefani2023realtime,
    author={Stefani, Domenico and Turchet, Luca},
    booktitle={4th International Symposium on the Internet of Sounds (IS2)}, 
    title={Real-Time Embedded Deep Learning on Elk Audio OS}, 
    year={2023},
    volume={},
    number={},
    pages={21-30},
    doi={10.1109/IEEECONF59510.2023.10335204}
}

Download PDF GitHub Repository Guide Download Poster PDF

•

2022, September ↑ Back to the top

A Comparison of Deep Learning Inference Engines for Embedded Real-time Audio Classification link

Domenico Stefani, Simone Peroni and Luca Turchet

in Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)

Abstract

Real-time applications of Music Information Retrieval (MIR) have been gaining interest as of recently. However, as deep learning becomes more and more ubiquitous for music analysis tasks, several challenges and limitations need to be overcome to deliver accurate and quick real-time MIR systems. In addition, modern embedded computers offer great potential for compact systems that use MIR algorithms, such as digital musical instruments. However , embedded computing hardware is generally resource constrained , posing additional limitations. In this paper, we identify and discuss the challenges and limitations of embedded real-time MIR. Furthermore, we discuss potential solutions to these challenges , and demonstrate their validity by presenting an embedded real-time classifier of expressive acoustic guitar techniques. The classifier achieved 99.2% accuracy in distinguishing pitched and percussive techniques and a 99.1% average accuracy in distinguishing four distinct percussive techniques with a fifth class for pitched sounds. The full classification task is a considerably more complex learning problem, with our preliminary results reaching only 56.5% accuracy. The results were produced with an average latency of 30.7 ms.

BibTeX

@inproceedings{stefani2022comparison,
    author = "Stefani, Domenico and Peroni, Simone and Turchet, Luca",
    title = "{A Comparison of Deep Learning Inference Engines for Embedded Real-Time Audio Classification}",
    booktitle = "Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)",
    location = "Vienna, Austria",
    eventdate = "2022-09-06/2022-09-10",
    year = "2022",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "3",
    doi = "",
    pages = "256--263"
}

Download PDF Google_Scholar ResearchGate

•

2022, September ↑ Back to the top

On the challenges of embedded real-time music information retrieval link

Domenico Stefani and Luca Turchet

in Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)

Abstract

BibTeX

@inproceedings{stefani2022challenges,
    author = "Stefani, Domenico and Turchet, Luca",
    title = "{On the Challenges of Embedded Real-Time Music Information Retrieval}",
    booktitle = "Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)",
    location = "Vienna, Austria",
    eventdate = "2022-09-06/2022-09-10",
    year = "2022",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "3",
    doi = "",
    pages = "177--184"
}

Download PDF Google_Scholar ResearchGate

•

2021, September ↑ Back to the top

Bio-Inspired Optimization of Parametric Onset Detectors link

Domenico Stefani and Luca Turchet

in Proceedings of the 24th International Conference on Digital Audio Effects (DAFx20in21), 2021

Abstract

Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multi-objective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.

BibTeX

@inproceedings{stefani2021bioinspired,
    author = "Stefani, Domenico and Turchet, Luca",
    title = "{Bio-Inspired Optimization of Parametric Onset Detectors}",
    booktitle = "Proc. 24th Int. Conf. on Digital Audio Effects (DAFx20in21)",
    location = "Vienna, Austria",
    eventdate = "2021-09-08/2021-09-10",
    year = "2021",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "2",
    pages = "268--275",
    doi={10.23919/DAFx51585.2021.9768293}
}

Download PDF

Short Papers and Talks ↑ Back to the top

•

2024,November ↑ Back to the top

Demo of Esteso: an AI Music Duet Based on Extended Double Bass Techniques link

Dictionary for Multidisciplinary Music Integration (DIMMI) Trento, November 29-30, 2024

This demo won the Best Demonstration award

Abstract

Esteso is an interactive improvisational system for double-bass based on player-idiosyncratic extended techniques. This system was created in collaboration with the contemporary double-bass player and composer Filippo Angeloni and tailored for his personal vocabulary of extended techniques. In Esteso, AI agent and the player engage in a duet, taking turns in the performance. The system replies with a manipulation of the real double-bass, achieved live through a timbre-transfer neural network, granular synthesis, and reverb. The timbre-transfer network was trained on a public double-bass dataset, resulting in a peculiar hybrid sound. Machine listening is integrated through a classifier of extended techniques played on the double-bass, whose output controls sound processing to affect various techniques differently. We present a demonstration of a performance where the double-bass player interacts with Esteso, creating a back-and-forth interplay between the acoustic and virtual elements.

Demo Proposal PDF

•

2024, November ↑ Back to the top

“Engine-Swap” on Two Spatial Audio Plugins Will Be Easy, Right? Lessons Learned link

Talk at the Audio Developer Conference 2024, 11-13 Nov 2024, Bristol UK

Abstract

Tackling a project that involves swapping the cores of two audio plugins seemed straightforward at first:
Yes, spatial audio is complex, but these are two similar JUCE plugins and I can just swap the core code components, it will be easy, right?
It wasn't, and it uncovered many unexpected challenges and learning opportunities.

In this talk, I will share my experience of improving an existing spatial audio plugin (SPARTA 6DoFConv) by replacing its convolution engine with a more efficient alternative. This process required deep dives into complex and sparsely commented audio DSP code and problem-solving.

The core of this presentation will focus on the general learning points from this endeavor. I will discuss some of the strategies I employed to understand and navigate complex codebases and the practical steps taken to embed a new convolution engine in an audio plugin. Additionally, I will explore the unforeseen issues that arose, such as dealing with the drawbacks of highly optimized algorithms and integrating a crossfade system, compromising between efficiency and level of integration.

This talk aims to provide valuable insights for developers, especially those who are starting out and want to start understanding and customizing other people's code. Join me in exploring the lessons learned, strategies employed, and trade-offs considered in creating a more efficient six-degrees-of-freedom spatial audio plugin.

Key Points:

• Addressing challenges with optimized algorithms and accepting tradeoffs;
• General lessons learned and best practices when working with other people's plugin code;
• Practical knowledge of different multichannel convolution engines for Ambisonics reverberation and 6 degrees-of-freedom navigation for extended reality applications.

ADC link Slides (PDF) Slides (PPTX)

•

2023,October ↑ Back to the top

Demo: Real-Time Embedded Deep Learning on Elk Audio OS link

Domenico Stefani

in 4th International Symposium on the Internet of Sounds (IS²), Pisa, Italy

Abstract

Recent advancements in deep learning architectures for music and the availability of powerful embedded computing platforms for low-latency audio processing have opened up exciting possibilities for new digital and Smart Musical Instruments. However, deploying neural networks to various embedded audio platforms remains a challenge. In particular, while Elk Audio OS on the Raspberry Pi 4 proved to be a capable platform for deep learning and audio processing, the deployment process has not been documented yet. In response, this demo proposal accompanies a systematic guide to deploying deep learning models for audio on embedded systems using Elk Audio OS. The proposed demo will cover the entire process, from creating a compatible code project to executing and diagnosing a VST plugin with deep learning inference on a Raspberry Pi. The demo will explore different approaches for real-time execution of deep learning inference on embedded devices and provide solutions for handling larger neural network models. To facilitate implementation and future updates, an online repository is provided 1, offering clean code templates, functional examples, and precompiled library binaries for TensorFlow Lite and ONNX runtime inference engines. The primary aim of this demo is to help bridge the gap between deep learning model development and real-world deployment on embedded systems, fostering the creation of self-contained digital musical instruments and audio devices equipped with real-time deep learning capabilities. By promoting the deployment of neural networks to embedded devices, this demo seeks to contribute to the development of Smart Musical Instruments capable of providing musicians and audiences with unprecedented services. Attendants will gain insights into the process of deploying deep learning models on embedded computers with Elk Audio OS.

Download PDF

•

2022, October ↑ Back to the top

Riconoscimento in Tempo Reale di Tecniche Espressive per Chitarra su Embedded Computers link

Domenico Stefani

in Proceedings of the XXIII Colloquio di Informatica Musicale/Colloquium of Musical Informatics (CIM), Ancona, Italy

🇬🇧 Abstract

In the last decades of the past century, innovations in analog electronics led to the creation of a series of guitar-based instrument-controllers that could produce a wide range of sounds by directly controlling audio synthesizers. Despite the various improvements brought to these systems over the years, they have always been limited to tracking the frequency and intensity over time of the notes played, failing to consider the subtle expressive techniques generally used by guitarists to mutate the sound of the instrument. In this paper I present my research, which aims to use the most modern techniques of deep learning to enable real-time recognition of the expressive technique used on a guitar. Particular attention will be paid to proposing implementations on embedded devices and considering the limitations of these, in a manner that can enable the creation of new intelligent instruments where signal analysis and synthesis of new sounds is carried out in a self-contained manner. Along with a description of the research and methodology used, the results obtained so far in deep learning for real-time classification of audio, classification of expressive techniques, and onset detection are presented. Finally, the lines of research that will be pursued in the near future are presented.

🇮🇹 Abstract

Negli ultimi decenni dello scorso secolo, le innovazioni nel campo dell'elettronica analogica hanno portato alla creazione di una serie di strumenti-controller basati sulla chitarra che potevano produrre un'ampia gamma di suoni controllando direttamente dei sintetizzatori audio. Nonostante i vari miglioramenti portati a questi sistemi negli anni, essi sono sempre stati limitati al tracciamento della frequenza e intensità nel tempo delle note suonate, mancando di considerare le sottili tecniche espressive generalmente usate dai chitarristi per mutare il suono dello strumento. In questo documento presento la mia ricerca, che si propone di utilizzare le tecniche pi ù moderne del deep learning per permettere il riconoscimento in tempo reale della tecnica espressiva usata su di una chitarra. Particolare attenzione verrà dedicata a proporre implementazioni su dispositivi embedded e a considerare le limitazioni di questi, di maniera da poter permettere la creazione di nuovi strumenti intelligenti dove l'analisi del segnale e sintesi di nuovi suoni venga svolta in maniera auto-contenuta. Assieme ad una descrizione della ricerca e della metodologia utilizzata, vengono presentati i risultati ottenuti fino ad ora nel deep learning per classificazione in tempo reale di audio, classificazione di tecniche espressive e onset detection. Infine, vengono presentate le linee di ricerca che verranno seguite nel futuro prossimo.

Download PDF Download Poster PDF

•

2022, June ↑ Back to the top

Workshop Talk: Embedded Real-Time Expressive Guitar Technique Recognition link

Domenico Stefani

in Embedded AI for NIME: Challenges and Opportunities Workshop

Download PDF ResearchGate

•

2020, September ↑ Back to the top

Demo of the TimbreID-VST Plugin for Embedded Real-Time Classification of Individual Musical Instrument Timbres link

Domenico Stefani and Luca Turchet

in Proceedings of the 27th Conference of the Open Innovations Association (FRUCT), 2020

Abstract

This demo presents the timbreID-VST plugin, an audio plugin in Virtual Studio Technology format dedicated to the embedded real-time classification of individual musical instruments timbres. The plugin was created by porting the code of the timbreID library, a collection of objects for the real-time programming language Pure Data that allows the real-time classification of features of audio signals. The JUCE framework and the building tools provided by the Elk Audio OS operating system were utilized, which allows the plugin to be used in the embedded systems supported by Elk Audio OS. The availability of the timbreID-VST plugin utilities as a library facilitates the development of intelligent applications for embedded audio, such as smart musical instruments. The plugin was trained to classify percussive timbres from an acoustic guitar.

BibTeX

@inproceedings{stefani2020demo,
    title={Demo of the TimbreID-VST Plugin for Embedded Real-Time Classification of Individual Musical Instruments Timbres},
    author={Stefani, Domenico and Turchet, Luca},
    booktitle={Proc. 27th Conf. of Open Innovations Association (FRUCT)},
    volume={2},
    pages={412-413},
    year={2020}
}

Download PDF