# Lastfm Archive [![Build Status](https://github.com/boonious/lastfm_archive/actions/workflows/elixir.yml/badge.svg)](https://github.com/boonious/lastfm_archive/actions/workflows/elixir.yml) [![Hex pm](http://img.shields.io/hexpm/v/lastfm_archive.svg?style=flat)](https://hex.pm/packages/lastfm_archive) [![Coverage Status](https://coveralls.io/repos/github/boonious/lastfm_archive/badge.svg)](https://coveralls.io/github/boonious/lastfm_archive?branch=master)
A tool for extracting and archiving Last.fm music listening data - [scrobbles](https://www.last.fm/about/trackmymusic).
**Note**: previous analytics features have been migrated to [coda](https://github.com/boonious/coda).
This includes the [on this day Livebook](https://github.com/boonious/coda#livebook).
Please check out [coda](https://github.com/boonious/coda) for more Last.fm listening data analytics
in the future.
## Usage
Download and create a file archive of Lastfm scrobble tracks via an [Elixir](https://elixir-lang.org)
application or [interactive Elixir](https://elixir-lang.org/getting-started/introduction.html#interactive-mode)
by invoking `iex -S mix` command line action while in software home directory.
```elixir
# archive all data of a default user specified in configuration
LastfmArchive.sync # subsequent calls download only latest scrobbles
# archive all data of any Lastfm user
# the data is stored in directory named after the user
LastfmArchive.sync("a_lastfm_user")
```
You can also deploy and use the tool in [Livebook](https://livebook.dev), check out:
- [Guides for archiving and data transforms](#livebook-guides)
Scrobbles are downloaded via the Last.fm API and stored in the file archive on demand
and on a daily basis. The software has a built-in cache to remember
and resume from the previous downloads. It skips already downloaded scrobbles and
does not make further requests to the API.
The stored data is in a raw Last.fm `recenttracks` JSON format,
chunked into 200-track (max) `gzip` compressed pages and stored within directories
corresponding to the days when tracks were scrobbled. The file archive in a main
directory specified in configuration - see below.
See [`sync/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#sync/2) for
various archiving options such `overwrite`, `year`, `date`.
### Transform into columnar storage formats
You can transform the file archive into other common storage formats such as CSV and
columnar data structure such as [Apache Parquet](https://parquet.apache.org).
These formats facilitate data interoperability, as well as
[OLAP](https://www.snowflake.com/guides/olap-vs-oltp), analytics use cases.
```elixir
# transform the file archive into columnar Apache Parquet files
LastfmArchive.transform("a_lastfm_user", format: :parquet)
# to columnar Apache Arrow IPC files
LastfmArchive.transform("a_lastfm_user", format: :ipc_stream)
# CSV format also available
LastfmArchive.transform("a_lastfm_user", format: :csv)
```
Available formats:
- CSV (tab-delimited)
- [Apache Arrow](https://arrow.apache.org) columnar format
- [Apache Parquet](https://parquet.apache.org) columnar format
See [`transform/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#transform/2).
### Read archive
The tool provides a [`read/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#read/2)
function for retrieving data from the archive. It mainly relies on
[Elixir Explorer](https://github.com/elixir-explorer/explorer) data frame mechanisms
to underpin further data i/o, manipulation, analytics and visualisation.
The function returns a lazy `t:Explorer.DataFrame.t/0`.
#### From raw data file archive
Scrobbles stored in the file archive can be read
with a `day` or `month` option:
```elixir
# read a single-day scrobbles for the configured default user
LastfmArchive.read(day: ~D[2022-12-31])
# read a single-month scrobbles for a user with an arbitrary day of a month
LastfmArchive.read("a_lastfm_user", month: ~D[2022-12-01])
```
#### From columnar archive for analytics
More data can be loaded from existing columnar archive.
[`read/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#read/2) can be used to
return a single-year or all scrobbles, i.e. **the entire dataset**.
A `columns` option is available to retrieve only a column subset.
```elixir
# load all 2023 data from a Parquet archive
LastfmArchive.read("a_lastfm_user", format: :parquet, year: 2023)
# load all data from an Arrow IPC archive
LastfmArchive.read("a_lastfm_user", format: :ipc_stream)
# load data from specific columns
LastfmArchive.read("a_lastfm_user", format: :parquet, columns: [:id, :artist, :album])
```
See [`read/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#read/2).
## Livebook guides
`LastfmArchive` also provides the following [Livebook](https://livebook.dev) interactive and step-by-step guides:
- [Creating a file archive](https://hexdocs.pm/lastfm_archive/archiving.html) guide for creating a local file archive consisting data fetched from the Last.fm API. It provides a heatmap and count visualisation for checking ongoing archiving status.
[![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fboonious%2Flastfm_archive%2Fmaster%2Flivebook%2Farchiving.livemd)
![archiving progress visualisation](assets/img/livebook_heatmap.png)
- [Columnar data transforms](https://hexdocs.pm/lastfm_archive/transforming.html) guide for transforming the local file archive to columnar data formats (Arrow, Parquet). It demonstrates how `read/2` can be used to load single-year single-column data, as well as an entire dataset into data frame for various analytics.
[![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fboonious%2Flastfm_archive%2Fmaster%2Flivebook%2Ftransforming.livemd)
![unique tracks by artists analytics](assets/img/livebook_unique_tracks_analytics.png)
## Other usage
To load all transformed CSV data from the archive into Solr:
```elixir
# define a Solr endpoint with %Hui.URL{} struct
headers = [{"Content-type", "application/json"}]
url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}
LastfmArchive.load_archive("a_lastfm_user", url)
```
The function finds CSV files from the archive and send them to
Solr for ingestion one at a time. It uses `Hui` client to interact
with Solr and the `t:Hui.URL.t/0` struct for Solr endpoint specification.
## Requirement
This tool requires Elixir and Erlang, see [installation](https://elixir-lang.org/install.html) details
for various operating systems or [Livebook](https://livebook.dev).
## Installation
`lastfm_archive` is [available in Hex](https://hex.pm/packages/lastfm_archive),
the package can be installed by adding `lastfm_archive`
to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:lastfm_archive, "~> 1.1"}
]
end
```
Documentation can be found at [https://hexdocs.pm/lastfm_archive](https://hexdocs.pm/lastfm_archive).
## Configuration
Add the following entries in your config - `config/config.exs`. For example,
the following specifies an Lastfm `user` and a main file location for
multiple user archives, `./lastfm_data/` relative to the software home directory.
You also need to specify an `lastfm_api_key` in the config, so that the application can
[access Lastfm API](https://www.last.fm/api/authentication).
```elixir
config :lastfm_archive,
user: "default_user", # the default user
data_dir: "./lastfm_data/", # main directory for multiple archives
lastfm_api_key: "api_key_provided_by_lastfm",
per_page: 200, # 200 is max no. of tracks per call permitted by Lastfm API
interval: 1000 # milliseconds between requests cf. Lastfm's max 5 reqs/s rate limit
# optional: Solr endpoint for Lastfm data loading
config :hui, :lastfm_archive,
url: "http://localhost:8983/solr/lastfm_archive",
handler: "update",
headers: [{"Content-type", "application/json"}]
```
See [`sync/2`](https://hexdocs.pm/lastfm_archive/LastfmArchive.html#sync/2)
for other configurable archiving options, e.g. `interval`, `per_page`.
See [`Hui`](https://hexdocs.pm/hui/readme.html#content) for more details on Solr configuration.
An `api_key` must be configured to enable Lastfm API requests,
see [https://www.last.fm/api](https://www.last.fm/api) ("Get an API account").