# Lastfm Archive [![Build Status](]( [![Hex pm](]( [![Coverage Status](](

A tool for extracting and archiving music listening data - [scrobbles](

- check out: the new [Facets archiving](#facets-archiving) Livebook
- previous analytics features have been migrated to [coda](, including [on this day]( Livebook.
Visit [coda]( for future Lastfm analytics.

## Usage

Download and create a file archive of Lastfm scrobble tracks via an [Elixir](
application or [interactive Elixir](
by invoking `iex -S mix` command line action while in software home directory. 
  # archive all data of a default user specified in configuration
  LastfmArchive.sync # subsequent calls download only latest scrobbles

  # archive all data of any Lastfm user
  # the data is stored in directory named after the user

You can also deploy and use the tool in [Livebook](,
as shown in various [Livebook guides](#livebook-guides).

Scrobbles are downloaded via the API and stored in the file archive on demand
and on a daily basis. The software has a built-in cache to remember 
and resume from the previous downloads. It skips already downloaded scrobbles and
does not make further requests to the API.

The stored data is in a raw `recenttracks` JSON format,
chunked into 200-track (max) `gzip` compressed pages and stored within directories
corresponding to the days when tracks were scrobbled. The file archive in a main 
directory specified in configuration - see below.

See [Creating a file archive](#creating-a-file-archive) guide and [`sync/2`]( for various archiving options such `overwrite`, `year`, `date`.

### Transform into columnar storage formats
You can transform the file archive into other common storage formats such as CSV and 
columnar data structure such as [Apache Parquet]( 
These formats facilitate data interoperability, as well as 
[OLAP](, analytics use cases.

# transform the file archive into columnar Apache Parquet files
LastfmArchive.transform("a_lastfm_user", format: :parquet)

# to columnar Apache Arrow IPC files
LastfmArchive.transform("a_lastfm_user", format: :ipc_stream)

# CSV format also available
LastfmArchive.transform("a_lastfm_user", format: :csv)

Available formats: 
- CSV (tab-delimited)
- [Apache Arrow]( columnar format
- [Apache Parquet]( columnar format

See [Columnar data transforms](#columnar-data-transforms) guide and 

### Transform into faceted columnar datasets
You can also transform the file archive into faceted (`artists`, `albums`, `tracks`)

LastfmArchive.transform("a_lastfm_user", format: :ipc_stream, facet: :artists)

See [Facets archiving](#facets-archiving) guide and 

### Read archive

The tool provides a [`read/2`](
function for retrieving data from the archive. It mainly relies on
[Elixir Explorer]( data frame mechanisms
to underpin further data i/o, manipulation, analytics and visualisation.

The function returns a lazy `t:Explorer.DataFrame.t/0`.

#### From raw data file archive

Scrobbles stored in the file archive can be read 
with a `day` or `month` option:

# read a single-day scrobbles for the configured default user ~D[2022-12-31])

# read a single-month scrobbles for a user with an arbitrary day of a month"a_lastfm_user",  month: ~D[2022-12-01])

#### From columnar archive for analytics

[`read/2`]( can
return a single-year or all scrobbles, i.e. **the entire dataset** from a columnar archive.
A `columns` option is available to retrieve only a column subset.

# load all 2023 data from a Parquet archive"a_lastfm_user", format: :parquet, year: 2023)

# load all data from an Arrow IPC archive"a_lastfm_user", format: :ipc_stream)

# load data from specific columns"a_lastfm_user", format: :parquet, columns: [:id, :artist, :album])

#### From faceted datasets for analytics
[`read/2`]( can also
return the faceted datasets, e.g. **all artists** from a columnar archive.

```elixir"a_lastfm_user", format: :ipc_stream, facet: :artists)

## Livebook guides

`LastfmArchive` also provides the following [Livebook]( interactive and step-by-step guides.

### Creating a file archive
[![Run in Livebook](](

[Creating a file archive]( guide for creating a local file archive consisting data fetched from the API. It provides a heatmap and count visualisation for checking ongoing archiving status.

![archiving progress visualisation](assets/img/livebook_heatmap.png)

### Columnar data transforms
[![Run in Livebook](](

[Columnar data transforms]( guide for transforming the local file archive to columnar data formats (Arrow, Parquet). It demonstrates how `read/2` can be used to load single-year single-column data, as well as an entire dataset into data frame for various analytics.

See a [sample output]( of this guide,
showing top tracks analytics.

### Facets archiving
[![Run in Livebook](](

[Facets archiving]( guide shows how the local file archive can generate faceted `artists`, `albums`, `tracks` columnar datasets. It also demos how the datasets may be used. For example finding the new artists discovered on a particular date,

![new artists discovered on this day](assets/img/livebook_new_artists_on_this_day.png)

and visualising all artists, when their were first listened to and overall popularity.

![all artists first played and popularity](assets/img/livebook_firstplay_bubble_plot.png)

## Other usage
To load all transformed CSV data from the archive into Solr:

  # define a Solr endpoint with %Hui.URL{} struct
  headers = [{"Content-type", "application/json"}]
  url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}

  LastfmArchive.load_archive("a_lastfm_user", url)

The function finds CSV files from the archive and send them to
Solr for ingestion one at a time. It uses `Hui` client to interact
with Solr and the `t:Hui.URL.t/0` struct for Solr endpoint specification.

## Requirement

This tool requires Elixir and Erlang, see [installation]( details
for various operating systems or [Livebook](

## Installation

`lastfm_archive` is [available in Hex](,
the package can be installed by adding `lastfm_archive`
to your list of dependencies in `mix.exs`:

  def deps do
      {:lastfm_archive, "~> 1.2"}

Documentation can be found at [](

## Configuration
Add the following entries in your config - `config/config.exs`. For example,
the following specifies an Lastfm `user` and a main file location for
multiple user archives, `./lastfm_data/` relative to the software home directory.

You also need to specify an `lastfm_api_key` in the config, so that the application can
[access Lastfm API](

  config :lastfm_archive,
    user: "default_user", # the default user
    data_dir: "./lastfm_data/", # main directory for multiple archives
    lastfm_api_key: "api_key_provided_by_lastfm",
    per_page: 200, # 200 is max no. of tracks per call permitted by Lastfm API 
    interval: 1000 # milliseconds between requests cf. Lastfm's max 5 reqs/s rate limit

  # optional: Solr endpoint for Lastfm data loading
  config :hui, :lastfm_archive,
    url: "http://localhost:8983/solr/lastfm_archive",
    handler: "update",
    headers: [{"Content-type", "application/json"}]


See [`sync/2`](
for other configurable archiving options, e.g. `interval`, `per_page`.

See [`Hui`]( for more details on Solr configuration.

An `api_key` must be configured to enable Lastfm API requests,
see []( ("Get an API account").