Tune sources

Here we demonstrate included utilities for loading tune data.

from pyabc2 import Tune
from pyabc2.sources import load_example, norbeck, the_session, eskin, bill_black, hardy

A few examples are included in the package, accessible with pyabc2.sources.load_example() (returns Tune) and pyabc2.sources.load_example_abc() (returns ABC string).

load_example("For the Love of Music")
Tune(title='For The Love Of Music', key=Gmaj, type='slip jig')

The tune source modules, demonstrated below, download tune data from the internet.

Norbeck

norbeck.load() gives us a list of Tunes for one of Norbeck’s tune type groups (e.g. ‘jigs’, ‘reels’, ‘slip jigs’).

tunes = norbeck.load("jigs")
print(len(tunes), "jigs loaded")

tunes[0]
downloading... 
done
556 jigs loaded
Tune(title="Bride's Favourite, The", key=Gmaj, type='jig')
tunes[-1]
Tune(title='Stone Step, The', key=Gmaj, type='jig')

The Session

the_session.load() gives us a list of Tunes loaded from a (frequently updated) archive of all of the tunes in The Session. This is a large dataset, so here we cap the processing.

tunes = the_session.load(n=500)

tunes[0]
downloading... 
done
/tmp/ipykernel_628/93315575.py:1: UserWarning: 22 out of 500 The Session tune(s) failed to load. Enable logging debug messages to see more info.
  tunes = the_session.load(n=500)
Tune(title="'S Ann An Ìle", key=Gmaj, type='strathspey')
tunes[-1]
Tune(title='A Trip To Galloway', key=Dmaj, type='waltz')
tune = the_session.load_url("https://thesession.org/tunes/21799#setting43712")
tune
Tune(title='The Cherrytree', key=Gmaj, type='jig')
tune.print_measures()
01: G2 d c B G
02: G F G A F D
03: G2 d A G G
04: d e e f g g
05: G2 d c B G
06: G F G A F D
07: B c c g2 B
08: B c g a f d
09: G2 d c B G
10: G F G A F D
11: G2 d A G G
12: d e e f g g
13: G2 d c B G
14: G F G A F D
15: B c c e2 B
16: A G E G A F
17: E B e d B e
18: d e g d B e
19: f B g f B f
20: f g g g a a
21: g a f f g e
22: f g d e d B
23: B d e f g A
24: A B d A G F
25: E B e d B e
26: d e g d B e
27: f B g f B f
28: f g g g a a
29: b2 a a g e
30: f g e e d e
31: d f a e d e
32: d B A A G F

Data archive

The Session data archive (https://github.com/adactio/TheSession-data) has many datasets (pyabc2.sources.the_session.load_meta()), which we can use in other ways besides parsing ABCs to Tunes.

For example, we can look for the most common ABC notes in the corpus.

%%time

df = the_session.load_meta("tunes", convert_dtypes=True)
df
CPU times: user 846 ms, sys: 127 ms, total: 973 ms
Wall time: 993 ms
tune_id setting_id name type meter mode abc date username composer
0 15326 28560 'S Ann An Ìle strathspey 4/4 Gmajor |:G>A B>G c>A B>G|E<E A>G F<D D2|G>A B>G c>A B... 2016-03-31 15:34:45 danninagh <NA>
1 15326 28582 'S Ann An Ìle strathspey 4/4 Gmajor uD2|:{F}v[G,2G2]uB>ud c>A B>G|{D}E2 uA>uG F<D ... 2016-04-03 09:15:08 DonaldK <NA>
2 14625 26955 'S Daor An Tabac reel 4/4 Bminor |:eAAB eABB|eAAB gedB|eAAB eABB|G2AB gedB:|\r\... 2015-07-31 02:47:47 Charles Mackenzie <NA>
3 5478 5478 'S Iomadh Rud A Chunnaic Mi reel 4/4 Gmajor ABBA GEDE|G2AG EGDG|ABBA GEDE|GEDE G2GA|\r\nAB... 2006-02-03 04:45:46 Andy F <NA>
4 5478 11429 'S Iomadh Rud A Chunnaic Mi reel 4/4 Dmajor |:e|f2fe dBAB|dded B2A2|f2fe dBAB|dBAB d2d:|\r... 2011-08-17 00:57:48 malcombpiper <NA>
... ... ... ... ... ... ... ... ... ... ...
54297 11995 11995 Zoidberg's jig 6/8 Dmajor |:f2f f2a|fed B2A|A2A ABd|e2e ~e2e|\r\nf2f f2a... 2012-06-19 12:12:46 DrugCrazed Patrick Rose
54298 22155 44612 Zolloko San Martinak polka 2/4 Fmajor K:F\r\na|:"A" ad' e'f'|"Dm" (3e'e'e' d'a|ac' b... 2022-08-21 12:24:32 Fernando Durbán Galnares Kepa Junkera
54299 11584 11584 Zonaradikos jig 6/8 Dmajor A|:d3 e3|f3f2e|fga gfe|d2c B2A|\r\nd3 e3|f3f2e... 2011-11-14 16:06:26 gian marco <NA>
54300 9013 9013 Zucchini Reel, The reel 4/4 Dmajor Ac|d2d=c ADFA|G2BG =cGBG|Add=c ADFD|GBAG FDAc|... 2008-10-18 06:05:43 Shelley <NA>
54301 13875 24924 Zuppa Inglese jig 6/8 Gmajor |:GAG GAB|cBc cde|d2B GAB|A2G E2D|\r\nGAG GAB|... 2014-10-07 22:55:26 Edward Nunn <NA>

54302 rows × 10 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54302 entries, 0 to 54301
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   tune_id     54302 non-null  Int64         
 1   setting_id  54302 non-null  Int64         
 2   name        54302 non-null  string        
 3   type        54302 non-null  category      
 4   meter       54302 non-null  category      
 5   mode        54302 non-null  category      
 6   abc         54302 non-null  string        
 7   date        54302 non-null  datetime64[ns]
 8   username    54302 non-null  string        
 9   composer    18288 non-null  string        
dtypes: Int64(2), category(3), datetime64[ns](1), string(4)
memory usage: 3.2 MB
from pyabc2.note import RE_NOTE as rx

rx
re.compile(r"(?P<acc>\^|\^\^|=|_|__)?(?P<note>[a-gA-G])(?P<oct>[,']*)(?P<num>[0-9]+)?(?P<slash>/+)?(?P<den>[0-9]+)?",
           re.UNICODE)

This regular expression does also match letters in tune titles, say.

["".join(tup) for tup in rx.findall("the quick brown fox jumps over the lazy dog")]
['e', 'c', 'b', 'f', 'e', 'e', 'a', 'd', 'g']

But The Session stores the tune body separately (in the abc field) and encourages a bare-bones melody-focused approach, so we can expect to mostly be matching actual notes.

from pprint import pprint

cool = df.query("tune_id == 1 and setting_id == 1")
display(cool.T)

abc = cool.abc.iloc[0]
print(abc, "\n")

pprint([m.group() for m in rx.finditer(abc)], compact=True)
10236
tune_id 1
setting_id 1
name Cooley's
type reel
meter 4/4
mode Edorian
abc |:D2|EBBA B2 EB|B2 AB dBAG|FDAD BDAD|FDAD dAFD...
date 2001-05-14 18:45:18
username Jeremy
composer <NA>
|:D2|EBBA B2 EB|B2 AB dBAG|FDAD BDAD|FDAD dAFD|
EBBA B2 EB|B2 AB defg|afec dBAF|DEFD E2:|
|:gf|eB B2 efge|eB B2 gedB|A2 FA DAFA|A2 FA defg|
eB B2 eBgB|eB B2 defg|afec dBAF|DEFD E2:| 

['D2', 'E', 'B', 'B', 'A', 'B2', 'E', 'B', 'B2', 'A', 'B', 'd', 'B', 'A', 'G',
 'F', 'D', 'A', 'D', 'B', 'D', 'A', 'D', 'F', 'D', 'A', 'D', 'd', 'A', 'F', 'D',
 'E', 'B', 'B', 'A', 'B2', 'E', 'B', 'B2', 'A', 'B', 'd', 'e', 'f', 'g', 'a',
 'f', 'e', 'c', 'd', 'B', 'A', 'F', 'D', 'E', 'F', 'D', 'E2', 'g', 'f', 'e',
 'B', 'B2', 'e', 'f', 'g', 'e', 'e', 'B', 'B2', 'g', 'e', 'd', 'B', 'A2', 'F',
 'A', 'D', 'A', 'F', 'A', 'A2', 'F', 'A', 'd', 'e', 'f', 'g', 'e', 'B', 'B2',
 'e', 'B', 'g', 'B', 'e', 'B', 'B2', 'd', 'e', 'f', 'g', 'a', 'f', 'e', 'c',
 'd', 'B', 'A', 'F', 'D', 'E', 'F', 'D', 'E2']
%%time

note_counts = (
    df.abc
    .str.findall(rx)
    .explode()
    .str.join("")
    .value_counts()
)
note_counts
CPU times: user 26.6 s, sys: 485 ms, total: 27.1 s
Wall time: 27 s
abc
A       733688
d       670685
B       664951
e       558934
c       453372
         ...  
D''          1
f9/          1
A2/4         1
e2//         1
^b4          1
Name: count, Length: 1031, dtype: int64
note_counts[:20]
abc
A     733688
d     670685
B     664951
e     558934
c     453372
G     449712
f     393893
F     318347
g     305356
E     269217
D     231414
a     204906
A2     97313
d2     94422
B2     80246
G2     75557
b      60225
e2     60202
c2     47538
C      44820
Name: count, dtype: int64

👆 We can see that A (unit duration) is the leader, being a prominent pitch in many of the common keys.

  • 5 in Dmaj

  • 2 in Gmaj

  • 1 in Ador, Amin, Amix, Amaj

Note

A implies A₄, the A above middle C, the A string on a violin, the lower register on the flute, etc.

Note

In general we don’t know the duration of A without context (L: header field, or based on M: if L: is not set). However, in this case, we know that The Session presets the unit duration to 1/8, so A is an eighth note.

from textwrap import wrap

print("\n".join(wrap("  ".join(note_counts[note_counts == 1].index))))
f'/2  _f4  b,,3  =G8  F5/4  c9  b'2  _A4  _G4  B4/  =A/4  g,/  d16
=c16  E24  _d/  E33  A6//  b5/  A,12  ^c'4  ^c'3  d'/4  ^A3/4  ^B3/4
A'/  a'4  ^G,,  ^A,,  e,,3  e,,2  f,,3  ^f2/3  ^c2/  A11  e2/4  e4/
f4/  =e//  ^g/2  ^A//  b,/2  D/8  E/8  B/8  B,'  f6/2  e6/2  f,3  ^b/
D11  C6/  ^D/2  _a3/2  G,12  _e8  ^c,2  ^B,2  c'3/4  d5/2  e''  ^A10
^a8  _A,6  d6/  a'/  ^G10  ^D6  ^D10  ^D8  d1/3  a8/3  c'1/2  C,12
=B5  A7/9  =F,4  f,8  =D6  e8/3  a/3  B16  G16  =a3  E23  A'4  D22  c7
f5/2  =G6  ^G'  B,9  ^A5  c13  G'4  F'3  ^e/2  c3//  D'/  C'/  B'/2
F,6  E,8  A/1  =B/4  G/1  B,//  _a4  e,2  ^g/8  _e'/  __d  d14  _c'2
e23  ^E,2  =E,2  B32  =c'4  _c'  E75  =C,  ^F,6  =F,2  =B,6  B,1  D,1
^C,2  ^C,  _g/  G,,4  e'3/2  ^f'2  D,'3  _d'3  =F,3  =F,,3  E,,3
_B,3/2  =c1  _d6  B7/4  _c3  A,9  =c2/3  =A4  D2/3  E6/  G6/  A4/3
^a//  F////  F///  =c'3/2  =e6  B11  e'1  D''  f9/  A2/4  e2//  ^b4

👆 A variety of ABC note specs appear only once. Many of these have unusual durations or accidentals.

What if we ignore everything except the natural note name?

nat_cased_counts = (
    note_counts
    .reset_index(drop=False)
    .rename(columns={"abc": "note"})
    .assign(nat=lambda df: df.note.str.extract(r"([a-gA-G])"))
    .groupby("nat")
    .aggregate({"count": "sum"})["count"]
    .sort_values(ascending=False)
)
nat_cased_counts
nat
A    925044
B    846010
d    827624
e    665590
G    590607
c    577271
f    474448
g    383882
F    382747
E    342141
D    300375
a    251176
b     71519
C     54919
Name: count, dtype: int64

👆 A is still our leader, but otherwise things have shifted a bit. Note C, which generally implies a pitch outside of the range of most whistles and flutes, has the lowest count. Although b is inside that range, many tunes don’t have one.

from pyabc2 import Note

(
    nat_cased_counts
    .to_frame()
    .assign(value=lambda df: df.index.map(lambda x: Note.from_abc(x).value))
    .sort_values("value")["count"]
    .plot.bar(
        xlabel="ABC letters\n(accidentals, octave indicators, and context in key ignored)",
        rot=0,
        ylabel="Count",
        title="ABC prevalance in The Session",
    )
);
../_images/fb59c484b833a6193f19b45b30f526812c09f9968d60c46c10827373942a3607.png

Eskin

Michael Eskin has tunebooks available at https://michaeleskin.com/tunebooks.html, viewable with his ABC Transcription Tools.

We can load selected tunebooks from there, e.g. the King Street Sessions:

df = eskin.load_meta("kss")
df
downloading... 
done
name abc group
0 The bonniest lass in the world X: 938\nT:The bonniest lass in the world\nR:Re... airs_songs
1 The Brae's of Lochiel X: 939\nT:The Braes of Lochiel\nT:Braigh Loch ... airs_songs
2 Farewell to whiskey X: 940\nT:Farewell to whiskey\nR:Air\nO:Scotla... airs_songs
3 Galen's arrival X: 941\nT:Galen's arrival\nR:Reel\nO:Scotland\... airs_songs
4 Give me your hand X: 942\nT:Give me your hand\nR:Air\nQ:180\nC:R... airs_songs
... ... ... ...
1001 Waiting for Peter X: 933\nT:Waiting for Peter\nR:Waltz\nC:Lee An... waltzes
1002 Waltz of the toys X: 934\nT:Waltz of the toys\nR:Waltz\nC:Michel... waltzes
1003 West of the River Shannon X: 935\nT:West of the River Shannon\nR:Waltz\n... waltzes
1004 Westphalia waltz X: 936\nT:Westphalia waltz\nR:Waltz\nB:The Wal... waltzes
1005 Wind on the heath X: 937\nT:Wind on the heath\nR:Waltz\nO:Scotla... waltzes

1006 rows × 3 columns

df.group.value_counts()
group
reels          353
jigs           260
hornpipes       75
scotchreels     64
polkas          46
slipjigs        37
strathspeys     33
waltzes         29
ocarolan        25
misc_tunes      21
marches         21
slides          19
airs_songs      16
long_dances      7
Name: count, dtype: int64
Tune(df.query("group == 'jigs'").iloc[0].abc)
Tune(title='The academy jig', key=Gmaj, type='Jig')
from IPython.display import display, Markdown

url = "https://michaeleskin.com/abctools/abctools.html?lzw=BoLgUAKiBiD2BOACCALApogMrAbhg8gGaICyArgM4CWAxmAEogUA2VADogFZUDmYAwiExUAXon4BDePFjNmYEiACcAegAcYTCACM6sAGkQAcTBGAogBFEFs0cQBBIwCFEAHwdG7zgCaI0333dzKxs7Rxo3RCc0DCd7F3MzRBBXMB5-PxVCFR4EpxUaFUDEdN80HgAjRAkAJmJ3Uszs3Id8wuL-F28nMKdAtIy0LJy8gqLIxvKq2olIipimnIxankjOxG7e+zdUoA"
display(Markdown(f"<{url}>"))
eskin.load_url(url)

Bill Black

Bill Black has an extensive ABC library, available at http://www.capeirish.com/ittl/. We can load all of the tune blocks (strings) with pyabc2.sources.bill_black.load_meta().

abcs = bill_black.load_meta()
len(abcs)
downloading... 
done
10249
Tune(abcs[0])
Tune(title='A-D POLKA', key=Dmaj, type='?')

Hardy

Paul Hardy has a tunebook collection available at https://pghardy.net/tunebooks/. We can load selected tunebooks as a list of tune blocks (strings) with pyabc2.sources.hardy.load_meta().

abcs = hardy.load_meta("basic")
len(abcs)
downloading... 
done
58
Tune(abcs[0])
Tune(title='Ash Grove, The', key=Gmaj, type='Waltz')