Domenico StefaniResearch and Coding

:::::::::  :::::::::: ::::::::  ::::::::::     :::     :::::::::   ::::::::  :::    :::
:+:    :+: :+:       :+:    :+: :+:          :+: :+:   :+:    :+: :+:    :+: :+:    :+:
+:+    +:+ +:+       +:+        +:+         +:+   +:+  +:+    +:+ +:+        +:+    +:+
+#++:++#:  +#++:++#  +#++:++#++ +#++:++#   +#++:++#++: +#++:++#:  +#+        +#++:++#++
+#+    +#+ +#+              +#+ +#+        +#+     +#+ +#+    +#+ +#+        +#+    +#+
#+#    #+# #+#       #+#    #+# #+#        #+#     #+# #+#    #+# #+#    #+# #+#    #+#
###    ### ########## ########  ########## ###     ### ###    ###  ########  ###    ###

                

My research is about music AI and real-time Music Information Retrieval (MIR) on small resource-constrained devices.

For my PhD I focused on smart-musical instrument, which are musical instruments can be designed to recognize certain high-level traits or properties from a music signal, such as the expressive techniques used or the mood of the music.

In turns, smart instruments can be designed to use this high-level property to trigger audio samples during a performance, control the synthesis of accompanying sounds, morph and trigger transitions on live visuals, control stage equipment, lighting and more. However, the underlying AI methods or algorithms must be lightweight, reliable, and fast, so that they can be used in real-time on tiny resource-constrained devices that can augment the performance of a musician.

My main line of research has been focused on the acoustic guitar and the recognition of both pitched and percussive expressive techniques (palm-mute, harmonics, hitting different areas of the guitar,...).

2024 (accepted) ↑ Back to the top

Musician-AI partnership mediated by emotionally-aware smart musical instruments link

Luca Turchet, Domenico Stefani, Johan Pauwels

International Journal of Human-Computer Studies

Abstract
The integration of emotion recognition capabilities within musical instruments can spur the emergence of novel art formats and services for musicians. This paper proposes the concept of emotionally-aware smart musical instruments, a class of musical devices embedding an artificial intelligence agent able to recognize the emotion contained in the musical signal. This spurs the emergence of novel services for musicians. Two prototypes of emotionally-aware smart piano and smart electric guitar were created, which embedded a recognition method for happiness, sadness, relaxation, aggressiveness and combination thereof. A user study, conducted with eleven pianists and eleven electric guitarists, revealed the strengths and limitations of the developed technology. On average musicians appreciated the proposed concept, who found its value in various musical activities. Most of participants tended to justify the system with respect to erroneous or partially erroneous classifications of the emotions they expressed, reporting to understand the reasons why a given output was produced. Some participants even seemed to trust more the system than their own judgments. Conversely, other participants requested to improve the accuracy, reliability and explainability of the system in order to achieve a higher degree of partnership with it. Our results suggest that, while desirable, perfect prediction of the intended emotion is not an absolute requirement for music emotion recognition to be useful in the construction of smart musical instruments.
2024, September (accepted) ↑ Back to the top

On the Importance of Temporally Precise Onset Annotations for Real-Time Music Information Retrieval: Findings from the AG-PT-set Dataset link

Domenico Stefani, Gregorio A. Giudici, Luca Turchet

In Proceedings of the 19th International Audio Mostly Conference (AM'24)

Abstract
In real-time Music Information Retrieval (MIR), small analysis windows are essential for achieving low retrieval latency. In turn, event-based real-time MIR methods require precise onset detectors to correctly align with the beginning of events such as musical notes. Detectors are typically trained using ground-truth annotations from datasets of interest. Yet, most MIR datasets do not prioritize the accurate timing of onset labels, and the evaluation of detectors often relies on generous tolerance windows (even ±50ms). In this paper we present AG-PT-set, a new dataset of acoustic guitar techniques with precise onset annotations. The dataset features 32,592 individual notes and over 10 hours of audio, covering eight techniques. Moreover, we assess the importance of exact onset labels across multiple real-time MIR tasks. Our results show how accurate timing of onset labels and precise detectors are crucial for real-time MIR tasks, as the performance of most algorithms degrades with imprecise onsets. In few occasions, imprecise onset timing slightly improved results, hinting at a possible similarity to data augmentation methods. Taken together, our findings indicate that temporally precise labels and detectors are always preferable, as robustness can always be obtained via artificial augmentation, while precision cannot be obtained as easily.
2024, September (accepted) ↑ Back to the top

Esteso: Interactive AI Music Duet Based on Player-Idiosyncratic Extended Double Bass Techniques link

Domenico Stefani. Matteo Tomasetti, Filippo Angeloni and Luca Turchet

In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME'24), Utrecht, The Netherlands.

Abstract
Extended playing techniques are a crucial characteristic of contemporary double bass practice. Players find their voice by developing a personal vocabulary of techniques through practice and experimentation. These player-idiosyncratic techniques are used in composition, performance, and improvisation. Today's AI methods offer the opportunity to recognize such techniques and repurpose them in real-time, leading to new forms of interactions between musicians and machines. This paper is the result of a collaboration between a composer/double-bass player and researchers, born from the musician's desire for an interactive improvisational experience with AI centered around the practice of his extended techniques. With this aim, we developed Esteso: an interactive improvisational system based on extended technique recognition, live electronics, and a timbre-transfer double-bass model. We evaluated our system with the musician with three duet improvisational sessions, each using different mapping strategies between the techniques and the sound of the virtual double bass counterpart. We collected qualitative data from the musician to gather insights about the three configurations and the corresponding improvisa-tional duets, as well as investigate the resulting interactions. We provide a discussion about the outcomes of our analysis and draw more general design considerations.
BibTeX
@inproceedings{stefani2024esteso,
    author = {Stefani, Domenico and Tomasetti, Matteo and Angeloni, Filippo and Turchet, Luca},
    year = {2024},
    month = {09},
    pages = {},
    title = {(Accepted) Esteso: Interactive AI Music Duet Based on Player-Idiosyncratic Extended Double Bass Techniques},
    booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression (NIME'24)},
}
2023, January ↑ Back to the top

PhD Thesis "Embedded Real-time Deep Learning for a Smart Guitar: A Case Study on Expressive Guitar Technique Recognition" link

Domenico Stefani

Abstract
Smart musical instruments are an emerging class of digital musical instruments designed for music creation in an interconnected Internet of Musical Things scenario. These instruments aim to integrate embedded computation, real-time feature extraction, gesture acquisition, and networked communication technologies. As embedded computers become more capable and new embedded audio platforms are developed, new avenues for real-time embedded gesture acquisition open up. Expressive guitar technique recognition is the task of detecting notes and classifying the playing techniques used by the musician on the instrument. Real-time recognition of expressive guitar techniques in a smart guitar would allow players to control sound synthesis or to wirelessly interact with a wide range of interconnected devices and stage equipment during performance. Despite expressive guitar technique recognition being a well-researched topic in the field of Music Information Retrieval, the creation of a lightweight real-time recognition system that can be deployed on an embedded platform still remains an open problem. In this thesis, expressive guitar technique recognition is investigated by focusing on real-time execution, and the execution of deep learning inference on resource-constrained embedded computers. Initial efforts have focused on clearly defining the challenges of embedded real-time music information retrieval, and on the creation of a first, fully embedded, real-time expressive guitar technique recognition system. The insight gained, led to the refinement of the various steps of the proposed recognition pipeline. As a first refinement step, a novel procedure for the optimization of onset detectors was developed. The proposed procedure adopts an evolutionary algorithm to find parameter configurations that are optimal both in terms of detection accuracy and latency. A subsequent study is devoted to shedding light on the performance of generic deep learning inference engines for embedded real-time audio classification. This consisted of a comparison of four common inferencing libraries, which focus on the applicability of each library to real-time audio inference, and their performance in terms of execution time and several additional metrics. Different insights from these studies supported the development of a new expressive guitar technique classifier, which is accompanied by an in-depth analysis of different aspects of the recognition problem. Finally, the experience collected during these studies culminated in the definition of a procedure to deploy deep learning inference to a prominent embedded platform. These investigations have been shown to improve the state-of-the-art by proposing approaches that surpass previous alternatives and providing new knowledge on problems and tools that can aid the creation of a smart guitar. The new knowledge provided was also adopted for embedded audio tasks that differ from real-time expressive guitar technique recognition.
BibTeX
@phdthesis{stefani2024embedded,
    author = {Stefani, Domenico},
    number = {2024},
    school = {University of Trento},
    title = {{Embedded Real-time Deep Learning for a Smart Guitar: A Case Study on Expressive Guitar Technique Recognition}},
    year = {2024}
}
2023, October ↑ Back to the top

Real-Time Embedded Deep Learning on Elk Audio OS link

Domenico Stefani and Luca Turchet

in Proceedings of the 4th International Symposium on the Internet of Sounds (IS2), Pisa, Italy

Abstract
Recent years have witnessed significant advancements in deep learning architectures for music, along with the availability of more powerful embedded computing platforms specific to low-latency audio processing tasks. These recent developments have opened promising avenues for new Smart Musical Instruments and audio devices that rely on the execution of deep learning models on small embedded computers. Despite these new opportunities, there is a lack of instructions on how to deploy neural networks to many promising embedded audio platforms, including the embedded real-time Elk Audio OS. In this paper, we introduce a procedure for deploying audio deep learning models on embedded systems utilizing the Elk Audio OS. The procedure covers the entire process, from creating a compatible code project to executing and diagnosing it on a Raspberry Pi. Moreover, we discuss different approaches for the real-time execution of deep learning inference on embedded devices and provide alternatives for handling larger neural network models. To facilitate implementation and support future updates, we provide an online repository with a detailed guide, code templates, functional examples, and precompiled library binaries for the TensorFlow Lite and ONNX Runtime inference engines. This work aims to bridge the gap between deep learning model development and real-world deployment on embedded systems, fostering the development of self-contained digital musical instruments and other audio devices equipped with real-time deep learning capabilities. By promoting the deployment of neural networks to embedded devices, we contribute to the development of Smart Musical Instruments that are capable of providing musicians and audiences with unprecedented services.
BibTeX
@inproceedings{stefani2023realtime,
    author={Stefani, Domenico and Turchet, Luca},
    booktitle={4th International Symposium on the Internet of Sounds (IS2)}, 
    title={Real-Time Embedded Deep Learning on Elk Audio OS}, 
    year={2023},
    volume={},
    number={},
    pages={21-30},
    doi={10.1109/IEEECONF59510.2023.10335204}
}
2022, September ↑ Back to the top

A Comparison of Deep Learning Inference Engines for Embedded Real-time Audio Classification link

Domenico Stefani, Simone Peroni and Luca Turchet

in Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)

Abstract
Real-time applications of Music Information Retrieval (MIR) have been gaining interest as of recently. However, as deep learning becomes more and more ubiquitous for music analysis tasks, several challenges and limitations need to be overcome to deliver accurate and quick real-time MIR systems. In addition, modern embedded computers offer great potential for compact systems that use MIR algorithms, such as digital musical instruments. However , embedded computing hardware is generally resource constrained , posing additional limitations. In this paper, we identify and discuss the challenges and limitations of embedded real-time MIR. Furthermore, we discuss potential solutions to these challenges , and demonstrate their validity by presenting an embedded real-time classifier of expressive acoustic guitar techniques. The classifier achieved 99.2% accuracy in distinguishing pitched and percussive techniques and a 99.1% average accuracy in distinguishing four distinct percussive techniques with a fifth class for pitched sounds. The full classification task is a considerably more complex learning problem, with our preliminary results reaching only 56.5% accuracy. The results were produced with an average latency of 30.7 ms.
BibTeX
@inproceedings{stefani2022comparison,
    author = "Stefani, Domenico and Peroni, Simone and Turchet, Luca",
    title = "{A Comparison of Deep Learning Inference Engines for Embedded Real-Time Audio Classification}",
    booktitle = "Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)",
    location = "Vienna, Austria",
    eventdate = "2022-09-06/2022-09-10",
    year = "2022",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "3",
    doi = "",
    pages = "256--263"
}
2022, September ↑ Back to the top

On the challenges of embedded real-time music information retrieval link

Domenico Stefani and Luca Turchet

in Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)

Abstract
Real-time applications of Music Information Retrieval (MIR) have been gaining interest as of recently. However, as deep learning becomes more and more ubiquitous for music analysis tasks, several challenges and limitations need to be overcome to deliver accurate and quick real-time MIR systems. In addition, modern embedded computers offer great potential for compact systems that use MIR algorithms, such as digital musical instruments. However , embedded computing hardware is generally resource constrained , posing additional limitations. In this paper, we identify and discuss the challenges and limitations of embedded real-time MIR. Furthermore, we discuss potential solutions to these challenges , and demonstrate their validity by presenting an embedded real-time classifier of expressive acoustic guitar techniques. The classifier achieved 99.2% accuracy in distinguishing pitched and percussive techniques and a 99.1% average accuracy in distinguishing four distinct percussive techniques with a fifth class for pitched sounds. The full classification task is a considerably more complex learning problem, with our preliminary results reaching only 56.5% accuracy. The results were produced with an average latency of 30.7 ms.
BibTeX
@inproceedings{stefani2022challenges,
    author = "Stefani, Domenico and Turchet, Luca",
    title = "{On the Challenges of Embedded Real-Time Music Information Retrieval}",
    booktitle = "Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22)",
    location = "Vienna, Austria",
    eventdate = "2022-09-06/2022-09-10",
    year = "2022",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "3",
    doi = "",
    pages = "177--184"
}
2021, September ↑ Back to the top

Bio-Inspired Optimization of Parametric Onset Detectors link

Domenico Stefani and Luca Turchet

in Proceedings of the 24th International Conference on Digital Audio Effects (DAFx20in21), 2021

Abstract
Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multi-objective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.
BibTeX
@inproceedings{stefani2021bioinspired,
    author = "Stefani, Domenico and Turchet, Luca",
    title = "{Bio-Inspired Optimization of Parametric Onset Detectors}",
    booktitle = "Proc. 24th Int. Conf. on Digital Audio Effects (DAFx20in21)",
    location = "Vienna, Austria",
    eventdate = "2021-09-08/2021-09-10",
    year = "2021",
    month = "Sept.",
    publisher = "",
    issn = "2413-6689",
    volume = "2",
    pages = "268--275",
    doi={10.23919/DAFx51585.2021.9768293}
}
2023,October ↑ Back to the top

Demo: Real-Time Embedded Deep Learning on Elk Audio OS link

Domenico Stefani

in 4th International Symposium on the Internet of Sounds (IS2), Pisa, Italy

Abstract
Recent advancements in deep learning architectures for music and the availability of powerful embedded computing platforms for low-latency audio processing have opened up exciting possibilities for new digital and Smart Musical Instruments. However, deploying neural networks to various embedded audio platforms remains a challenge. In particular, while Elk Audio OS on the Raspberry Pi 4 proved to be a capable platform for deep learning and audio processing, the deployment process has not been documented yet. In response, this demo proposal accompanies a systematic guide to deploying deep learning models for audio on embedded systems using Elk Audio OS. The proposed demo will cover the entire process, from creating a compatible code project to executing and diagnosing a VST plugin with deep learning inference on a Raspberry Pi. The demo will explore different approaches for real-time execution of deep learning inference on embedded devices and provide solutions for handling larger neural network models. To facilitate implementation and future updates, an online repository is provided 1, offering clean code templates, functional examples, and precompiled library binaries for TensorFlow Lite and ONNX runtime inference engines. The primary aim of this demo is to help bridge the gap between deep learning model development and real-world deployment on embedded systems, fostering the creation of self-contained digital musical instruments and audio devices equipped with real-time deep learning capabilities. By promoting the deployment of neural networks to embedded devices, this demo seeks to contribute to the development of Smart Musical Instruments capable of providing musicians and audiences with unprecedented services. Attendants will gain insights into the process of deploying deep learning models on embedded computers with Elk Audio OS.
2022, October ↑ Back to the top

Riconoscimento in Tempo Reale di Tecniche Espressive per Chitarra su Embedded Computers link

Domenico Stefani

in Proceedings of the XXIII Colloquio di Informatica Musicale/Colloquium of Musical Informatics (CIM), Ancona, Italy

🇬🇧 Abstract
In the last decades of the past century, innovations in analog electronics led to the creation of a series of guitar-based instrument-controllers that could produce a wide range of sounds by directly controlling audio synthesizers. Despite the various improvements brought to these systems over the years, they have always been limited to tracking the frequency and intensity over time of the notes played, failing to consider the subtle expressive techniques generally used by guitarists to mutate the sound of the instrument. In this paper I present my research, which aims to use the most modern techniques of deep learning to enable real-time recognition of the expressive technique used on a guitar. Particular attention will be paid to proposing implementations on embedded devices and considering the limitations of these, in a manner that can enable the creation of new intelligent instruments where signal analysis and synthesis of new sounds is carried out in a self-contained manner. Along with a description of the research and methodology used, the results obtained so far in deep learning for real-time classification of audio, classification of expressive techniques, and onset detection are presented. Finally, the lines of research that will be pursued in the near future are presented.
🇮🇹 Abstract
Negli ultimi decenni dello scorso secolo, le innovazioni nel campo dell'elettronica analogica hanno portato alla creazione di una serie di strumenti-controller basati sulla chitarra che potevano produrre un'ampia gamma di suoni controllando direttamente dei sintetizzatori audio. Nonostante i vari miglioramenti portati a questi sistemi negli anni, essi sono sempre stati limitati al tracciamento della frequenza e intensità nel tempo delle note suonate, mancando di considerare le sottili tecniche espressive generalmente usate dai chitarristi per mutare il suono dello strumento. In questo documento presento la mia ricerca, che si propone di utilizzare le tecniche pi ù moderne del deep learning per permettere il riconoscimento in tempo reale della tecnica espressiva usata su di una chitarra. Particolare attenzione verrà dedicata a proporre implementazioni su dispositivi embedded e a considerare le limitazioni di questi, di maniera da poter permettere la creazione di nuovi strumenti intelligenti dove l'analisi del segnale e sintesi di nuovi suoni venga svolta in maniera auto-contenuta. Assieme ad una descrizione della ricerca e della metodologia utilizzata, vengono presentati i risultati ottenuti fino ad ora nel deep learning per classificazione in tempo reale di audio, classificazione di tecniche espressive e onset detection. Infine, vengono presentate le linee di ricerca che verranno seguite nel futuro prossimo.
2022, June ↑ Back to the top

Workshop Talk: Embedded Real-Time Expressive Guitar Technique Recognition link

Domenico Stefani

in Embedded AI for NIME: Challenges and Opportunities Workshop

2020, September ↑ Back to the top

Demo of the TimbreID-VST Plugin for Embedded Real-Time Classification of Individual Musical Instrument Timbres link

Domenico Stefani and Luca Turchet

in Proceedings of the 27th Conference of the Open Innovations Association (FRUCT), 2020

Abstract
This demo presents the timbreID-VST plugin, an audio plugin in Virtual Studio Technology format dedicated to the embedded real-time classification of individual musical instruments timbres. The plugin was created by porting the code of the timbreID library, a collection of objects for the real-time programming language Pure Data that allows the real-time classification of features of audio signals. The JUCE framework and the building tools provided by the Elk Audio OS operating system were utilized, which allows the plugin to be used in the embedded systems supported by Elk Audio OS. The availability of the timbreID-VST plugin utilities as a library facilitates the development of intelligent applications for embedded audio, such as smart musical instruments. The plugin was trained to classify percussive timbres from an acoustic guitar.
BibTeX
@inproceedings{stefani2020demo,
    title={Demo of the TimbreID-VST Plugin for Embedded Real-Time Classification of Individual Musical Instruments Timbres},
    author={Stefani, Domenico and Turchet, Luca},
    booktitle={Proc. 27th Conf. of Open Innovations Association (FRUCT)},
    volume={2},
    pages={412-413},
    year={2020}
}