Penguin Cafe Orchestra insights with

  • 21 May 2021
  • 1 reply

Userlevel 3
Badge +1

Please find the notebook version of this thread here.


Let's build a small application to investigate one of my favourite artists. They are called "The Penguin Cafe Orchestra" and if you don't know them you are going to find out what they are about, thanks to NL API.

Our dataset: a list of their album's reviews that I took from Piero Scaruffi's website and saved in a dedicated folder.
Our goal: to understand more about an artist - using albums reviews.
Our practical goal: to see how NL API work and what they can do.


What is The Penguin Cafe Orchestra about?

First let's see what comes out from the reviews just analysing the words used in them. We'll firstly concatenate all the reviews in one variable, in order to have a whole artist's review. Then we are going to take a look at the most frequent words in them, hoping that it will reveal more on the Penguin Cafe Orchestra.

### Code for iterating on the artist's folder and concatenate albums's reviews in one single artist's review
import os

artist_review = ''
artist_path = 'penguin_cafe_orchestra'
albums = os.listdir(artist_path)

for album in albums:
album_path = os.path.join(artist_path, album)
with open(album_path, 'r', encoding = 'utf8') as file:
review =
artist_review += review

Using a shallow-linguistics approach we can investigate the artist review, which contains all the available reviews. To do so we use matplotlib and wordcloud to produce a wordcloud that will tell us more about the most frequent words in the text.

# Import packages
import matplotlib.pyplot as plt
%matplotlib inline
# Define a function to plot word cloud
def plot_cloud(wordcloud):
# Set figure size
plt.figure(figsize=(30, 10))
# Display image
# No axis details

# Import package
from wordcloud import WordCloud, STOPWORDS
# Generate word cloud
wordcloud = WordCloud(width = 3000, height = 2000, random_state=1, background_color='white', collocations=False, stopwords = STOPWORDS).generate(artist_review)
# Plot
A wordcloud in which the most used words appear in a bigger font and the less used one in a smaller font.


How does their music make you feel?

Thanks to the wordcloud we know more about them: we know that they use instruments such as ukulele, piano, violin,.. and that they mix genres such as folk, ethnic, classical, ...
Still, we have no idea of the style of the artist. We can know more by looking at what emotions come out of their work.
To do so, we are going to use NL API. Please register here, find the documentation on the SDK here and on the features here.

### Install the python SDK
!pip install expertai-nlapi
## Code for initializing the client and then use the emotional-traits taxonomy
import os

from import ExpertAiClient
client = ExpertAiClient()

os.environ["EAI_USERNAME"] = 'your_username'
os.environ["EAI_PASSWORD"] = 'your_password'

emotions =[]
weights = []

output = client.classification(body={"document": {"text": artist_review}}, params={'taxonomy': 'emotional-traits', 'language': 'en'})

for category in output.categories:
emotion = category.label
weight = category.frequency


For retrieving weights we used “frequency” which is actually a percentage: the sum of all the frequencies is 100. This makes the frequencies of the emotions a good candidate for a pie chart, that is plotted using matplotlib.

# Import libraries
from matplotlib import pyplot as plt
import numpy as np

# Creating plot
colors = ['#0081a7','#2a9d8f','#e9c46a','#f4a261', '#e76f51']
fig = plt.figure(figsize =(10, 7))
plt.pie(weights, labels = emotions, colors=colors, autopct='%1.1f%%')

# show plot
A pie chart representing each emotion and its percentage.


What's their best album?

If you wanted to start to listen to them, to see if you feel the same emotions Scaruffi's found in their work, where could you start? We can take a look at the sentiment analysis for each album and get an idea of their best ones. To do so, we iterate on each album's review and use NL API to retrieve their sentiment and its strenght.

## Code for iterating on each album and retrieving the sentiment
sentiment_ratings = []
albums_names = [album[:-4] for album in albums]

for album in albums:
album_path = os.path.join(artist_path, album)
with open(album_path, 'r', encoding = 'utf8') as file:
review =
output = client.specific_resource_analysis(
body={"document": {"text": review}},
params={'language': 'en', 'resource': 'sentiment'
sentiment = output.sentiment.overall


Now we can visualize the sentiment for each review using a bar chart, that will give us a quick visual feedback on the best album of the Penguin Cafe Orchestra and on their career. To do so we use once again matplotlib.

import matplotlib.pyplot as plt'ggplot')

albums_names = [name[:-4] for name in albums], sentiment_ratings, color='#70A0AF')
plt.ylabel("Album rating")
plt.title("Ratings of Penguin Cafe Orchestra's album")

plt.xticks(albums_names, rotation=70)


1 reply

Userlevel 1

And The Penguin Cafe is awesome!  NL API + great music recommendation = value add….thanks Laura