{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Tune sources\n",
    "\n",
    "Here we demonstrate included utilities for loading tune data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyabc2.sources import load_example, norbeck, the_session"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "A few examples are included in the package, accessible with {func}`pyabc2.sources.load_example` (returns {class}`~pyabc2.Tune`) and {func}`pyabc2.sources.load_example_abc` (returns ABC string)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "load_example(\"For the Love of Music\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "The tune source modules, demonstrated below, download tune data from the internet."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Norbeck\n",
    "\n",
    "{func}`norbeck.load() <pyabc2.sources.norbeck.load>` gives us a list of {class}`~pyabc2.Tune`s for one of [Norbeck's](https://www.norbeck.nu/abc/) tune type groups (e.g. 'jigs', 'reels', 'slip jigs')."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "tunes = norbeck.load(\"jigs\")\n",
    "print(len(tunes), \"jigs loaded\")\n",
    "\n",
    "tunes[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "tunes[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## The Session\n",
    "\n",
    "{func}`the_session.load() <pyabc2.sources.the_session.load>` gives us a list of {class}`~pyabc2.Tune`s loaded from a (frequently updated) archive of all of the tunes in [The Session](https://thesession.org/). This is a large dataset, so here we cap the processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "tunes = the_session.load(n=500)\n",
    "\n",
    "tunes[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "tunes[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "tune = the_session.load_url(\"https://thesession.org/tunes/21799#setting43712\")\n",
    "tune"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "tune.print_measures()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "### Data archive\n",
    "\n",
    "The Session data archive (<https://github.com/adactio/TheSession-data>) has many datasets ({func}`pyabc2.sources.the_session.load_meta`),\n",
    "which we can use in other ways besides parsing ABCs to {class}`~pyabc2.Tune`s."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "For example, we can look for the most common ABC notes in the corpus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "df = the_session.load_meta(\"tunes\", convert_dtypes=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyabc2.note import _RE_NOTE as rx\n",
    "\n",
    "rx"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "This regular expression does also match letters in tune titles, say."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "[\"\".join(tup) for tup in rx.findall(\"the quick brown fox jumps over the lazy dog\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "But The Session stores the tune body separately (in the `abc` field) and encourages a bare-bones melody-focused approach, so we can expect to mostly be matching actual notes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "cool = df.query(\"tune_id == 1 and setting_id == 1\")\n",
    "display(cool.T)\n",
    "\n",
    "abc = cool.abc.iloc[0]\n",
    "print(abc, \"\\n\")\n",
    "\n",
    "pprint([m.group() for m in rx.finditer(abc)], compact=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "note_counts = (\n",
    "    df.abc\n",
    "    .str.findall(rx)\n",
    "    .explode()\n",
    "    .str.join(\"\")\n",
    "    .value_counts()\n",
    ")\n",
    "note_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "note_counts[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "👆 We can see that `A` (unit duration) is the leader, being a prominent pitch in many of the common keys.\n",
    "* 5 in Dmaj\n",
    "* 2 in Gmaj\n",
    "* 1 in Ador, Amin, Amix, Amaj\n",
    "\n",
    "```{note}\n",
    "`A` implies A₄, the A above middle C, the A string on a violin, the lower register on the flute, etc.\n",
    "```\n",
    "\n",
    "```{note}\n",
    "In general we don't know the duration of `A` without context (`L:` header field, or based on `M:` if `L:` is not set).\n",
    "However, in this case, we know that the The Session presets the unit duration to `1/8`,\n",
    "so `A` is an eighth note.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "from textwrap import wrap\n",
    "\n",
    "print(\"\\n\".join(wrap(\"  \".join(note_counts[note_counts == 1].index))))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "👆 A variety of ABC note specs appear only once. Many of these have unusual durations or accidentals."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "What if we ignore everything except the natural note name?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "nat_cased_counts = (\n",
    "    note_counts\n",
    "    .reset_index(drop=False)\n",
    "    .rename(columns={\"index\": \"note\", \"abc\": \"count\"})\n",
    "    .assign(nat=lambda df: df.note.str.extract(r\"([a-gA-G])\"))\n",
    "    .groupby(\"nat\")\n",
    "    .aggregate({\"count\": \"sum\"})[\"count\"]\n",
    "    .sort_values(ascending=False)\n",
    ")\n",
    "nat_cased_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "👆 `A` is still our leader, but otherwise things have shifted a bit.\n",
    "Note `C`, which generally implies a pitch outside of the range of most whistles and flutes,\n",
    "has the lowest count.\n",
    "Although `b` is inside that range, many tunes don't have one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyabc2 import Note\n",
    "\n",
    "(\n",
    "    nat_cased_counts\n",
    "    .to_frame()\n",
    "    .assign(value=lambda df: df.index.map(lambda x: Note.from_abc(x).value))\n",
    "    .sort_values(\"value\")[\"count\"]\n",
    "    .plot.bar(\n",
    "        xlabel=\"ABC letters\\n(accidentals, octave indicators, and context in key ignored)\",\n",
    "        rot=0,\n",
    "        ylabel=\"Count\",\n",
    "        title=\"ABC prevalance in The Session\",\n",
    "    )\n",
    ");"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}