# Lastfm Archive [![Build Status](]( [![Hex pm](]( [![Coverage Status](](

A tool for creating local file archive and Livebook analytics from music listening data or 

♫♫ Check out the new [on this day](#livebook-analytics) analytics Livebook.

## Usage

Download and create a file archive of Lastfm scrobble tracks via an [Elixir](
application or [interactive Elixir](
by invoking `iex -S mix` command line action while in software home directory. 
  # archive all data of a default user specified in configuration
  LastfmArchive.sync # subsequent calls download only latest scrobbles

  # archive all data of any Lastfm user
  # the data is stored in directory named after the user

You can also deploy and use the tool in [Livebook](, check out:
- [Guides for archiving and data transforms](#livebook-guides) 
- [Livebook analytics](#livebook-analytics)

Scrobbles are downloaded via the API and stored in the file archive on demand
and on a daily basis. The software has a built-in cache to remember 
and resume from the previous downloads. It skips already downloaded scrobbles and
does not make further requests to the API.

The stored data is in a raw `recenttracks` JSON format,
chunked into 200-track (max) `gzip` compressed pages and stored within directories
corresponding to the days when tracks were scrobbled. The file archive in a main 
directory specified in configuration - see below.

### Transform into columnar storage formats
You can transform the file archive into other common storage formats such as CSV and 
columnar data structure such as [Apache Parquet]( 
These formats facilitate data interoperability, as well as 
[OLAP](, analytics use cases.

# transform the file archive into columnar Apache Parquet files
LastfmArchive.transform("a_lastfm_user", format: :parquet)

# to columnar Apache Arrow IPC files
LastfmArchive.transform("a_lastfm_user", format: :ipc_stream)

# CSV format also available
LastfmArchive.transform("a_lastfm_user", format: :csv)

Available formats: 
- CSV (tab-delimited)
- [Apache Arrow]( columnar format
- [Apache Parquet]( columnar format

See [`transform/2`](

### Read archive

The tool provides a [`read/2`](
function for retrieving data from the archive. It mainly relies on
[Elixir Explorer]( data frame mechanisms
to underpin further data i/o, manipulation, analytics and visualisation.

The function returns a lazy `t:Explorer.DataFrame.t/0`.

#### From raw data file archive

Scrobbles stored in the file archive can be read 
with a `day` or `month` option:

# read a single-day scrobbles for the configured default user ~D[2022-12-31])

# read a single-month scrobbles for a user with an arbitrary day of a month"a_lastfm_user",  month: ~D[2022-12-01])

#### From columnar archive for analytics

More data can be loaded from existing columnar archive.
[`read/2`]( can be used to
return a single-year or all scrobbles, i.e. **the entire dataset**.
A `columns` option is available to retrieve only a column subset.

# load all 2023 data from a Parquet archive"a_lastfm_user", format: :parquet, year: 2023)

# load all data from an Arrow IPC archive"a_lastfm_user", format: :ipc_stream)

# load data from specific columns"a_lastfm_user", format: :parquet, columns: [:id, :artist, :album])

See [`read/2`](

## Livebook analytics

[On this day ♫]( presents
analytics of all music played on this day (today) over the years. The page also features an interactive 
[Kino explorer]( to help delving into the data.

[![Run in Livebook](](

![on this day most played analytics](assets/img/livebook_on_this_day_most_played_analytics.png)

## Livebook guides

`LastfmArchive` also provides the following [Livebook]( interactive and step-by-step guides: 
  - [Creating a file archive]( guide for creating a local file archive consisting data fetched from the API. It provides a heatmap and count visualisation for checking ongoing archiving status.

    [![Run in Livebook](](

    ![archiving progress visualisation](assets/img/livebook_heatmap.png)
  - [Columnar data transforms]( guide for transforming the local file archive to columnar data formats (Arrow, Parquet). It demonstrates how `read/2` can be used to load single-year single-column data, as well as an entire dataset into data frame for various analytics. 

    [![Run in Livebook](](

    ![unique tracks by artists analytics](assets/img/livebook_unique_tracks_analytics.png)   
## Other usage
To load all transformed CSV data from the archive into Solr:

  # define a Solr endpoint with %Hui.URL{} struct
  headers = [{"Content-type", "application/json"}]
  url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}

  LastfmArchive.load_archive("a_lastfm_user", url)

The function finds CSV files from the archive and send them to
Solr for ingestion one at a time. It uses `Hui` client to interact
with Solr and the `t:Hui.URL.t/0` struct for Solr endpoint specification.

## Requirement

This tool requires Elixir and Erlang, see [installation]( details
for various operating systems or [Livebook](

## Installation

`lastfm_archive` is [available in Hex](,
the package can be installed by adding `lastfm_archive`
to your list of dependencies in `mix.exs`:

  def deps do
      {:lastfm_archive, "~> 0.11"}

Documentation can be found at [](

## Configuration
Add the following entries in your config - `config/config.exs`. For example,
the following specifies an Lastfm `user` and a main file location for
multiple user archives, `./lastfm_data/` relative to the software home directory.

You also need to specify an `lastfm_api_key` in the config, so that the application can
[access Lastfm API](

  config :lastfm_archive,
    user: "default_user", # the default user
    data_dir: "./lastfm_data/", # main directory for multiple archives
    lastfm_api_key: "api_key_provided_by_lastfm",
    per_page: 200, # 200 is max no. of tracks per call permitted by Lastfm API 
    interval: 1000 # milliseconds between requests cf. Lastfm's max 5 reqs/s rate limit

  # optional: Solr endpoint for Lastfm data loading
  config :hui, :lastfm_archive,
    url: "http://localhost:8983/solr/lastfm_archive",
    handler: "update",
    headers: [{"Content-type", "application/json"}]


See [`sync/2`](
for other configurable archiving options, e.g. `interval`, `per_page`.

See [`Hui`]( for more details on Solr configuration.

An `api_key` must be configured to enable Lastfm API requests,
see []( ("Get an API account").