LISTener's Friend
A flexible, interactive program that converts album lists to Spotify playlists
Ingredients: Python, API, OAuth, Recursion, File I/O, Web Scraping, Selenium, Headless-Chrome, BeautifulSoup, HTML Parsing
Below you will find the current contents of the project README to which I've added some snippets of the relevant code. If you would like a closer look at the program and how it is written please take a look at the repo linked above.
Why? When exploring new music I've always preferred listening to albums rather than "top" tracks or algorithmically generated playlists. The problem is that building playlists manually, say from a list like this, takes a lot of copy-paste-searching and click-n-dragging, so I built this to make the process faster and easier.
Getting Started
- Clone or download a copy of this repo.
- Install the required dependencies:
pip install python-dotenv spotipy selenium bs4
- Rename the included
example.env
to.env
, change the default input file path if desired, and update with your API credentials. - Run the program:
python listeners_friend.py
- Select type of input source from given options
- Option 1: Add a list of albums in the format
artist - album
, each on a new line, toinput.txt
- Option 2, 6, 8: Provide a URL for supported source
- Options 3, 4: No user input needed, fully automated
- Options 5, 7: Interactive user input selection
- Option 1: Add a list of albums in the format
- A new playlist will be created directly in your Spotify account, ready to play!
How It Works
Basics
- User is authenticated with the Spotify API, either using the API credentials stored in the
.env
file or an existing token stored in the.cache
file - The chosen data source is read and the necessary information is parsed into an array of tuples in the form
[(artist, album)]
which serves as the finalized input - A playlist name and description are either provided by the user (op. 1), taken directly from the data source (op. 2), or auto-generated (op. 3)
- The finalized input is looped through and the Spotify library is queried for each entry with
q="album:{album} artist:{artist}"
limited to the album data type with a maximum result quantity of 1 so that only the top matching result (if any) is grabbed - The album match data is parsed for its URI string
- Albums which are't found are added to a separate list
- The API is again queried using the album URI to fetch the tracklist data
- Tracklist data is parsed for each individual track's URI
- Once tracks have been gathered for every album URI, a count is taken of the albums not found and this is prepended to the playlist description in the form
"### {albums or tracks} not found.
- A text file is saved to the current directory which contains the
artist - {album or track}
info for each album or track not found - The necessary info is sent to the user's account for the creation of a new playlist and its ID is retrieved
- Track URIs are posted to the new playlist
// utils.py
# Print number of missing items to file and prepend basic info to playlist description
def handle_missing(missing, playlist_name, playlist_description, type):
clean_playlist_name = re.sub(r"[/\\?%*:|\"<>\x7F\x00-\x1F]", "-", playlist_name)
with open(f"not_found_for_{clean_playlist_name}_{today}.txt", "w", encoding="utf-8") as file:
file.write("Not Found:\n")
for item in missing:
file.write(item + "\n")
if (len(missing) > 1):
type = type + "s";
playlist_description = str(len(missing)) + f" {type} not found. " + playlist_description
print(f"Info on missing {type} saved to text file in directory...")
return playlist_description
def album_search(playlist_name, playlist_description, input_list, spotify):
# Search for each album
print("Searching Spotify library for albums...")
for artist, album in input_list:
# album specific search using Spotify's "album:{query}" format
# returns only the top result
result = spotify.search(q='album:' + album + ' artist:' + artist, type="album", limit=1)
if result['albums']['items']:
album_uris.append(result['albums']['items'][0]['uri'])
else:
missing.append(artist + " - " + album)
if len(missing) > 0:
print(f"{len(missing)} albums weren't found...")
playlist_description = handle_missing(missing, playlist_name, playlist_description, "album")
track_uris = track_search(spotify, album_uris)
return track_uris, playlist_description, (len(input_list) - len(missing))
# Take album URIs and return track URIs
def track_search(spotify, album_uris = None, track_input = None, playlist_name = None, playlist_description = None):
if album_uris:
print("Searching for associated tracks...")
# Get the tracks for each album URI
for album in album_uris:
tracks = spotify.album_tracks(album)['items']
# Get the URIs for each track
for track in tracks:
track_uris.append(track['uri'])
print(f"{len(album_uris)} albums have been converted to {len(track_uris)} tracks...")
return track_uris
elif track_input:
print("Searching Spotify library for tracks...")
for track in track_input:
artist, title = track
result = spotify.search(q=f"artist:{artist} track:{title}", type="track", limit=1)
if result['tracks']['items']:
uri = result['tracks']['items'][0]['uri']
if uri:
track_uris.append(uri)
else:
missing.append(artist + " - " + title)
if len(missing) > 0:
print(f"{len(missing)} tracks weren't found...")
playlist_description = handle_missing(missing, playlist_name, playlist_description, "track")
return track_uris, playlist_description
# Check if track_uris > 11,000 (the max playlist size) and split into two playlists if so
# You don't need more than two... right? 22,000 tracks seems like an OK maximum
def giant_check(track_uris, playlist_description, playlist_name, spotify):
overflow_track_uris = []
print('Uh oh, this list is too big to contain in a single playlist!')
print(f"Splitting {len(track_uris)} tracks in two...")
overflow_track_uris = track_uris[11000:]
max_track_uris = track_uris[:11000]
overflow_playlist_name = playlist_name + " (Part 2)"
playlist_name = playlist_name + " (Part 1)"
print(f"Creating {playlist_name}, which contains 11,000 tracks")
create_playlist(max_track_uris, playlist_description, playlist_name, spotify)
print(f"Creating {overflow_playlist_name}, which contains {len(overflow_track_uris)} tracks")
create_playlist(overflow_track_uris, playlist_description, overflow_playlist_name, spotify)
# Break potentially huge list of tracks into easily manageable chunks
def chunk_list(lst, chunk_size):
for i in range(0, len(lst), chunk_size):
yield lst[i:i + chunk_size]
# Take track URIs and final playlist title & description
def create_playlist(track_uris, playlist_description, playlist_name, spotify):
# Split into two playlists if maximum track count is hit
if len(track_uris) > 11000:
track_uris = giant_check(track_uris, playlist_description, playlist_name, spotify)
user_id = spotify.current_user()["id"]
if track_uris:
playlist = spotify.user_playlist_create(user_id, playlist_name, public=True, description=playlist_description)
print(f"Playlist {playlist_name} has been created...")
track_uris = [uri for uri in track_uris if uri is not None]
# Split to chunks of 100 tracks, the max allowed by the API in a single post
for chunk in chunk_list(track_uris, 100):
try:
spotify.playlist_add_items(playlist['id'], chunk)
except spotipy.exceptions.SpotifyException as e:
print(f"An error occurred: {e}")
else:
print("List of track URIs is empty, nothing added to playlist.")
Spotify API Limitations
- Only individual tracks or podcast episodes, not albums, can be added to playlists, so this is why URIs for each individual track on an album are collected rather than just posting the album URI to the playlist
- Only 100 tracks can be posted to a playlist at a time so large lists of track URIs are broken into chunks of 100 or less before being sent
- A Spotify playlist can have a maximum of 11,000 tracks, which you can certainly run up against if using a huge RYM list, and in those cases two playlists will be created
{playlist name} (Part 1)
with the first 11,000 tracks and{playlist name} (Part 2)
with the overflow.
Authorization
It is necessary to have Spotify API credentials stored in the .env
file. If you need help, see the guide at the end of this doc for details.
The first time you run the program you'll be sent to a Spotify authorization page in your browser. It should be asking you if you want to allow connecting to { whatever you named your app when getting your API credentails }. After this you'll be routed to your Redirect URI. Copy the full URL and paste it into the command prompt to finalize authorization. Your OAuth token will be stored in the .cache
file.
from dotenv import load_dotenv
import spotipy
from spotipy.oauth2 import SpotifyOAuth
load_dotenv()
spotify = spotipy.Spotify(auth_manager=SpotifyOAuth(
client_id=os.getenv("SPOTIPY_CLIENT_ID"),
client_secret=os.getenv("SPOTIPY_CLIENT_SECRET"),
redirect_uri=os.getenv("SPOTIPY_REDIRECT_URI"),
scope="playlist-modify-public"
))
Options
// utils.py
def display_options(options):
# Check if this is just a bunch of strings to print or something more complex
if type(options[0]) == str:
for idx, option in enumerate(options, start=1):
print(f"{idx}. {option}")
# Right now this is just for displaying lists of NTS Episodes
else:
for idx, option in enumerate(options, start=1):
tags = " #".join(f"{option['tags'][i]}" for i in range(len(option["tags"])))
print(f"{idx}. {option['date']} {option['title']}, {option['location']} #{tags}")
def get_user_selection(options):
while True:
display_options(options)
try:
print()
selected_option = int(input("Please select an option: "))
if 1 <= selected_option <= len(options):
return selected_option
else:
print(f"Invalid selection. Please choose a number between 1 and {len(options)}.")
except ValueError:
print("Invalid input. Please enter a number.")
Option 1: Use txt file
- Filepath is read from the
.env
file, default is./input.txt
- Input in the form of
artist - album \n
is parsed from the text file into the finalized input array - User is prompted to enter a name and an (optional) description for the new playlist
//main.py
if selected_option == 1:
with open(os.getenv("INPUT_PATH"), "r") as file:
input_list = [tuple(line.strip().split(" - ", 1)) for line in file if line.strip()]
playlist_name = input("Enter playlist name: ")
playlist_description = input("Enter playlist description: ")
// ... snip ...
if output_type == 'albums':
print("Creating an album-based playlist")
track_uris, playlist_description, count = album_search(playlist_name, playlist_description, input_list, spotify)
// ... snip ...
create_playlist(track_uris, playlist_description, playlist_name, spotify)
print(f"Playlist \"{playlist_name}\" has been successfully created!")
if output_type == 'tracks':
print(f"It contains {len(track_uris)} tracks!")
elif output_type == 'albums':
print(f"It contains {count} albums for a total of {len(track_uris)} tracks!")
print(f"Get to listening!")
Options 2-8: Scraped Web Data Input
Lists hosted on supported websites can be scraped using Selenium and BeautifulSoup to build the finalized input array
//scraper.py
import os
from dotenv import load_dotenv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
load_dotenv()
def get_soup(url):
print("Fetching data...")
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument(f"user-agent={os.getenv('USER_AGENT')}")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(service=Service(), options=chrome_options)
driver.get(url)
html = driver.page_source
driver.quit()
return BeautifulSoup(html, 'html.parser')
If you are running the program in Bash you may have difficulty entering a URL for options 2 or 8!
Option 2: Use RateYourMusic List URL
- The URL for a list at RateYourMusic can be supplied as an input source
- Playlist name is automatically set to the name of the list and description is set to the list description (truncated if it runs beyond Spotify's 300 character limit)
- The program will iterate through multi-page lists but you may experience a premature end due to RYM's use of the occasional random CloudFlare captcha
- A way to circumvent this is tba, for now just retry or pass in the URL of the page where it stopped
// option_handlers.py
def handle_rym_list(soup):
global page
print(f"Processing page {str(page)}...")
playlist_name = soup.find('h1').get_text(strip=True)
playlist_description = soup.find('span', class_='rendered_text')
if playlist_description is None:
playlist_description = ''
else:
playlist_description = playlist_description.get_text(strip=True)
# Truncate long descriptions to fit within Spotify's 300 character limit
# Also taking into consideration the 19-21 chars of "### items not found. " prepended later
playlist_description = (playlist_description[:276] + '...') if len(playlist_description) > 279 else playlist_description
has_next_page = False
next_url = None
nav_div = soup.find('div', id='nav_bottom')
if nav_div:
nav_span = nav_div.find('span', class_='navspan')
if nav_span:
navlink_next = nav_span.find('a', class_='navlinknext')
if navlink_next:
next_url = navlink_next['href']
if next_url:
next_url = "https://rateyourmusic.com" + next_url
has_next_page = True
list = soup.find('table', id='user_list')
if list is None:
print("List not found on page {page}, potential captcha block")
else:
for row in list.find_all('tr'):
artist_tag = row.find('a', class_='list_artist')
album_tag = row.find('a', class_='list_album')
if not artist_tag or not album_tag:
continue
artist = artist_tag.get_text(strip=True)
album = album_tag.get_text(strip=True)
input_list.append((artist, album))
if has_next_page == True:
page = page + 1
next_bowl_of_soup = get_soup(next_url)
handle_rym_list(next_bowl_of_soup)
return playlist_name, playlist_description, input_list
Option 3: Use Current Boomkat Bestsellers List
- The Boomkat Bestsellers list is used as input source
- Playlist name is automatically set to
Boomkat Bestsellers
and the description is set toFor the week ending YYYY-MM-DD
, where date is the day the data was gathered - Timespan default is one week but can be changed by modifying the (hardcoded) source URL
https://boomkat.com/bestsellers?q[release_date]=last-week
wherelast-week
can be replaced withlast-month
orlast-year
.
// option_handlers.py
def handle_boomkat(soup):
print(f"Processing Boomkat Bestsellers list for the week ending {today}...")
table = soup.find('div', class_='bestsellers')
if table is None:
print("Table not found")
bestsellers_list = table.find('ol', class_='bestsellers-list')
if bestsellers_list is None:
print("List not found")
else:
for item in bestsellers_list.find_all('li', class_='bestsellers-item'):
artist = item.find('div', class_='product-name').find_all('a')[0].text.strip().title()
album = item.find('div', class_='product-name').find_all('a')[1].text.strip()
input_list.append((artist, album))
playlist_name = "This Week's Boomkat Bestsellers"
playlist_description = "For the week ending " + today
return playlist_name, playlist_description, input_list
Option 4: Use Current Forced Exposure Bestsellers List
- The Forced Exposure Bestsellers list is used as input source
- Playlist name is automatically set to
Forced Exposure Bestsellers
and the description is set toAs of YYYY-MM-DD
, where date is the day the data was gathered - Forced Exposure doesn't offer any alternate timespans and it's not clear how often the list is updated
(Code to handle this is conceptually the same as the above Boomkat list handler with modifications to handle the differing page structure)
Options 5 & 6: Browse and Select from WFMU's "Heavy Play" Archive
- Browse the WFMU "Heavy Play" archive by date
- Provide a year and receive a list of dated lists to select as an input source
- Current Limitations:
- 1987-1996 lists are only available as PDF downloads
- 1997 to late 2018 vary in format and structure and have inconsistent support at the moment
- Current Limitations:
- Generates a playlist name
WFMU Heavy Play {Month DD, YYYY}
and a description{#} albums not found.
- Option 6 offers the ability to skip the browsing and provide a list URL directly (date limitations still apply!)
These can be pretty huge (as in a few thousand songs) so they take a little longer to build than other options and the playlists themselves can be a little slow in your Spotify client!
// options_handlers.py
def handle_wfmu_latest(soup):
year = input("For what year (2014-present): ")
print("Select a date: ")
# List available dates for selected year as YYYY-MM-DD
for a_tag in soup.find_all("a", class_="playlist"):
href = a_tag.get("href")
match = re.search(r"/(\d{4})/", href)
if match:
url_year = match.group(1)
if url_year == year:
date_match = re.search(r"(\d{4}-\d{2}-\d{2})\.html", href)
if date_match:
date = date_match.group(1)
print(date)
date = input("Enter selection as YYYY-MM-DD: ")
sub_url = "http://blogfiles.wfmu.org/BT/Airplay_Lists/" + year + "/" + date + ".html"
return handle_wfmu_list(get_soup(sub_url))
def handle_wfmu_list(soup):
date = soup.title.text[28:]
print(date)
playlist_name = "WFMU Heavy Play " + date
playlist_description = ""
for ul in soup.find_all('ul'):
if any(li.find('strong') for li in ul.find_all('li')):
for li in ul.find_all('li'):
match = re.match(r'^(.*?) - (.*?) \((.*?)\)$', li.text)
if match:
artist_name, album_title, record_label = match.groups()
input_list.append((artist_name.title(), album_title))
return playlist_name, playlist_description, input_list
Options 7 & 8: Browse and Select from Recent NTS Radio Broadcasts
- Differs from the standard options in that this is a track-based playlist builder
- Browse the latest from NTS Radio
- Returns the 12 most recent broadcasts, including date, name of the program, broadcast location, and any tags
- Generates a playlist name
NTS: {broadcast title} ({dd.mm.yy})
and description{#} tracks not found.{original episode description}. Broadcast: {YYYY-MM-DD}, {location}
- Number of episodes returned can be modified by changing the value of
limit=12
at the beginning of thehandle_nts_latest
function. - Option 8 offers the option to skip the browsing and provide an episode URL directly
(Code to handle this is conceptually the same as the above WFMU list handler with modifications to handle the differing page structures)
How to Get Spotify API Credentials
To get the necessary info for your .env
file you'll first need a (free) Spotify Developer account.
-
After logging in and landing on the dev dashboard click Create app.
-
Fill out the required fields:
- Give your app a name (i.e. Text-to-Playlist App) and a brief description, maybe something to remind you why you made it.
- For the Redirect URI you can supply your own or just use https://example.org/callback. Click Add.
- Check the box for Web API access and save.
-
After creating the app you'll be taken to its dashboard. Click Settings in the top right corner. Everything you need for your
.env
file is here on this page: -
Copy the Client ID and Client Secret (click View client secret) to your
.env
file. If you forgot what Redirect URI you chose earlier you can also grab that from here. Theexample.env
is prepopulated withhttps://example.org/callback
. -
You're ready to start building playlists!