The system architecture for building and running a large-scale
streaming service involves a bit more than massive farms of servers. A good
analogy to use is that of a retail supply chain network, e.g. Walmart or
Kroger. Most brick and mortar retail companies typically have a few massive central
warehouses, which feed into multiple tiers of smaller regional distribution
centers. From here on, product is shipped to grocery stores, or in the case of
online orders, directly to consumers’ homes or restaurants.
A company offering streaming services follow a similar
pattern. It utilizes a technology called ‘content delivery network’ – a geographically
distributed group of servers which work together to provide fast delivery of
internet content (music and metadata). Just like retail operations, the
streaming service content delivery network consists of a few large data centers
(massive server farms), and numerous, geographically dispersed, smaller data
centers located close to the end user based on geographical proximity, network latency and
other factors. When a user in China streams music via Tidal or Qobuz, in most
cases it is served from a smaller, regional data center which is located closer
to the user. The various data centers are connected to each other via high
speed, high bandwidth networking systems which are just as important as the servers
that process and store the music.
Secondly, more and more streaming services are outsourcing the
provisioning of servers and networking to cloud service providers like Amazon,
Microsoft, Google and Akamai. Very few are running their own data centers anymore. For
example, Spotify runs their content delivery network in Google Cloud. A
positive aspect of this trend is that it lowers the barrier to entry for many
smaller players who otherwise would have required massive capital investments
just to get started.
For data formats and conversions, the companies use a type
of software called ‘media transcoder’ to convert media files from their source
versions (e.g. as transmitted by the content creators) to multiple formats that
can be played back on devices like smartphones, tablets, PCs, etc. The
transcoder software runs on a special type of server which focuses on
compute power. Once transcoded, the various formats are stored on yet another
type of server which is optimized for storage.
On the software side, two very important technologies are
graph databases and machine learning. Together, they are responsible for
analyzing your song/artist/genre choices, demographics, age, and various other
factors to compile ‘playlists’ that are based on a deep understanding of your listening
habits. These technologies are ‘self-learning’, i.e. the more you use them, the
smarter and more accurate they get. As the hardware and network aspects of
operations become commoditized (i.e. outsourced to cloud service providers),
the companies that will survive and prosper will be the ones that best utilize
modern technologies like machine learning and artificial intelligence.
Of course, the explanation above is an oversimplification of
what is actually involved, but hopefully it gives you an idea of some basic underlying technologies.