{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Tune sources\n", "\n", "Here we demonstrate included utilities for loading tune data." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "from pyabc2.sources import load_example, norbeck, the_session" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "A few examples are included in the package, accessible with {func}`pyabc2.sources.load_example` (returns {class}`~pyabc2.Tune`) and {func}`pyabc2.sources.load_example_abc` (returns ABC string)." ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "load_example(\"For the Love of Music\")" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "The tune source modules, demonstrated below, download tune data from the internet." ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "## Norbeck\n", "\n", "{func}`norbeck.load() ` gives us a list of {class}`~pyabc2.Tune`s for one of [Norbeck's](https://www.norbeck.nu/abc/) tune type groups (e.g. 'jigs', 'reels', 'slip jigs')." ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "tunes = norbeck.load(\"jigs\")\n", "print(len(tunes), \"jigs loaded\")\n", "\n", "tunes[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "tunes[-1]" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## The Session\n", "\n", "{func}`the_session.load() ` gives us a list of {class}`~pyabc2.Tune`s loaded from a (frequently updated) archive of all of the tunes in [The Session](https://thesession.org/). This is a large dataset, so here we cap the processing." ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "tunes = the_session.load(n=500)\n", "\n", "tunes[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "tunes[-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "tune = the_session.load_url(\"https://thesession.org/tunes/21799#setting43712\")\n", "tune" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "tune.print_measures()" ] }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "### Data archive\n", "\n", "The Session data archive () has many datasets ({func}`pyabc2.sources.the_session.load_meta`),\n", "which we can use in other ways besides parsing ABCs to {class}`~pyabc2.Tune`s." ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "For example, we can look for the most common ABC notes in the corpus." ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "df = the_session.load_meta(\"tunes\", convert_dtypes=True)\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [ "from pyabc2.note import _RE_NOTE as rx\n", "\n", "rx" ] }, { "cell_type": "markdown", "id": "18", "metadata": {}, "source": [ "This regular expression does also match letters in tune titles, say." ] }, { "cell_type": "code", "execution_count": null, "id": "19", "metadata": {}, "outputs": [], "source": [ "[\"\".join(tup) for tup in rx.findall(\"the quick brown fox jumps over the lazy dog\")]" ] }, { "cell_type": "markdown", "id": "20", "metadata": {}, "source": [ "But The Session stores the tune body separately (in the `abc` field) and encourages a bare-bones melody-focused approach, so we can expect to mostly be matching actual notes." ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", "\n", "cool = df.query(\"tune_id == 1 and setting_id == 1\")\n", "display(cool.T)\n", "\n", "abc = cool.abc.iloc[0]\n", "print(abc, \"\\n\")\n", "\n", "pprint([m.group() for m in rx.finditer(abc)], compact=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "22", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "note_counts = (\n", " df.abc\n", " .str.findall(rx)\n", " .explode()\n", " .str.join(\"\")\n", " .value_counts()\n", ")\n", "note_counts" ] }, { "cell_type": "code", "execution_count": null, "id": "23", "metadata": {}, "outputs": [], "source": [ "note_counts[:20]" ] }, { "cell_type": "markdown", "id": "24", "metadata": {}, "source": [ "👆 We can see that `A` (unit duration) is the leader, being a prominent pitch in many of the common keys.\n", "* 5 in Dmaj\n", "* 2 in Gmaj\n", "* 1 in Ador, Amin, Amix, Amaj\n", "\n", "```{note}\n", "`A` implies A₄, the A above middle C, the A string on a violin, the lower register on the flute, etc.\n", "```\n", "\n", "```{note}\n", "In general we don't know the duration of `A` without context (`L:` header field, or based on `M:` if `L:` is not set).\n", "However, in this case, we know that the The Session presets the unit duration to `1/8`,\n", "so `A` is an eighth note.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "25", "metadata": {}, "outputs": [], "source": [ "from textwrap import wrap\n", "\n", "print(\"\\n\".join(wrap(\" \".join(note_counts[note_counts == 1].index))))" ] }, { "cell_type": "markdown", "id": "26", "metadata": {}, "source": [ "👆 A variety of ABC note specs appear only once. Many of these have unusual durations or accidentals." ] }, { "cell_type": "markdown", "id": "27", "metadata": {}, "source": [ "What if we ignore everything except the natural note name?" ] }, { "cell_type": "code", "execution_count": null, "id": "28", "metadata": {}, "outputs": [], "source": [ "nat_cased_counts = (\n", " note_counts\n", " .reset_index(drop=False)\n", " .rename(columns={\"index\": \"note\", \"abc\": \"count\"})\n", " .assign(nat=lambda df: df.note.str.extract(r\"([a-gA-G])\"))\n", " .groupby(\"nat\")\n", " .aggregate({\"count\": \"sum\"})[\"count\"]\n", " .sort_values(ascending=False)\n", ")\n", "nat_cased_counts" ] }, { "cell_type": "markdown", "id": "29", "metadata": {}, "source": [ "👆 `A` is still our leader, but otherwise things have shifted a bit.\n", "Note `C`, which generally implies a pitch outside of the range of most whistles and flutes,\n", "has the lowest count.\n", "Although `b` is inside that range, many tunes don't have one." ] }, { "cell_type": "code", "execution_count": null, "id": "30", "metadata": {}, "outputs": [], "source": [ "from pyabc2 import Note\n", "\n", "(\n", " nat_cased_counts\n", " .to_frame()\n", " .assign(value=lambda df: df.index.map(lambda x: Note.from_abc(x).value))\n", " .sort_values(\"value\")[\"count\"]\n", " .plot.bar(\n", " xlabel=\"ABC letters\\n(accidentals, octave indicators, and context in key ignored)\",\n", " rot=0,\n", " ylabel=\"Count\",\n", " title=\"ABC prevalance in The Session\",\n", " )\n", ");" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }