![]() Evaluations range from low-level visual features to high-level concepts retrieved by means of Deep Convolutional Neural Networks. A custom dataset is created, suitable to develop and test visual features which are able to represent music related information. A series of comprehensive experiments and evaluations are conducted which are focused on the extraction of visual information and its application in different MIR tasks. In a further consequence it can be concluded that this visual information is music related and thus should be beneficial for the corresponding MIR tasks such as music genre classification or mood recognition. This leads to the hypothesis that there exists a visual language that is used to express mood or genre. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone, without the sound being heard. This thesis focuses on the information provided by the visual layer of music videos and how it can be harnessed to augment and improve tasks of the MIR research domain. This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. Finally, we hope that the OLGA dataset will facilitate research on data-driven models for artist similarity. ![]() Results show the superiority of the proposed approach over current state-of-the-art methods for music similarity. ![]() Moreover, we also showcase the scalability of our approach by experimenting with a much larger proprietary dataset. With 17,673 artists, this is the largest academic artist similarity dataset that includes content-based features to date. To evaluate the proposed method, we compile the new OLGA dataset, which contains artist similarities from AllMusic, together with content features from AcousticBrainz. The novelty of using a graph neural network architecture is to combine the topology of a graph of artist connections with content features to embed artists into a vector space that encodes similarity. In this paper, we present a hybrid approach to computing similarity between artists using graph neural networks trained with triplet loss. We show that the proposed regularization approach clearly improves the performance for long-tail artists, without negatively affecting results for well-connected ones it computes high-quality embeddings and good similarity scores for everyone.Īrtist similarity plays an important role in organizing, understanding, and subsequently, facilitating discovery in large collections of music. Such artists may benefit less from graph-based methods, since they typically have few known connections. ![]() Beyond the overall evaluation, we investigate the effectiveness of the proposed model for long-tail artists. We find that using graph neural networks yields superior overall results compared to state-of-the-art methods. To evaluate the proposed method, we use two datasets: the open OLGA dataset, which contains artist similarities from AllMusic, together with content features from AcousticBrainz, and a larger, proprietary dataset. Additionally, we propose a simple and effective regularization method-'connection dropout'-which aims at improving results for long-tail artists, for which few existing connections are known. Artist similarity plays an important role in organizing, understanding, and subsequently, facilitating discovery in large collections of music.
0 Comments
Leave a Reply. |