What is the hardware physically behind Tidal, Spotify, QoBuz?


Dear Late Night Audiogoners,

I'm calling into the show tonight because I was wondering something you may know about.  What the hardware (and software) look like that allow streaming services to operate. 

Are we talking massive server rooms?  

Is there a physical process involved in getting music into a format to store for streaming?  Do the record companies need to transfer a digital file to the service for streaming? 

-Curious in Chicago
128x128jbhiller

The system architecture for building and running a large-scale streaming service involves a bit more than massive farms of servers. A good analogy to use is that of a retail supply chain network, e.g. Walmart or Kroger. Most brick and mortar retail companies typically have a few massive central warehouses, which feed into multiple tiers of smaller regional distribution centers. From here on, product is shipped to grocery stores, or in the case of online orders, directly to consumers’ homes or restaurants. 

A company offering streaming services follow a similar pattern. It utilizes a technology called ‘content delivery network’ – a geographically distributed group of servers which work together to provide fast delivery of internet content (music and metadata). Just like retail operations, the streaming service content delivery network consists of a few large data centers (massive server farms), and numerous, geographically dispersed, smaller data centers located close to the end user based on geographical proximity, network latency and other factors. When a user in China streams music via Tidal or Qobuz, in most cases it is served from a smaller, regional data center which is located closer to the user. The various data centers are connected to each other via high speed, high bandwidth networking systems which are just as important as the servers that process and store the music.

Secondly, more and more streaming services are outsourcing the provisioning of servers and networking to cloud service providers like Amazon, Microsoft, Google and Akamai. Very few are running their own data centers anymore. For example, Spotify runs their content delivery network in Google Cloud. A positive aspect of this trend is that it lowers the barrier to entry for many smaller players who otherwise would have required massive capital investments just to get started.

For data formats and conversions, the companies use a type of software called ‘media transcoder’ to convert media files from their source versions (e.g. as transmitted by the content creators) to multiple formats that can be played back on devices like smartphones, tablets, PCs, etc. The transcoder software runs on a special type of server which focuses on compute power. Once transcoded, the various formats are stored on yet another type of server which is optimized for storage.

On the software side, two very important technologies are graph databases and machine learning. Together, they are responsible for analyzing your song/artist/genre choices, demographics, age, and various other factors to compile ‘playlists’ that are based on a deep understanding of your listening habits. These technologies are ‘self-learning’, i.e. the more you use them, the smarter and more accurate they get. As the hardware and network aspects of operations become commoditized (i.e. outsourced to cloud service providers), the companies that will survive and prosper will be the ones that best utilize modern technologies like machine learning and artificial intelligence.

Of course, the explanation above is an oversimplification of what is actually involved, but hopefully it gives you an idea of some basic underlying technologies.

They will have all moved to the outsourced, hyper-scaling models offered by the one cited above (AWS, Azure, Google, Akamai), and a group of upcomers such as Oracle.  The barriers are low, the capital requirements are huge, and it just makes sense.  On-line video services like Zoom do the exact same thing.