Why does a Music Server require high processing CPU power?


I noticed that some music servers use, for example, a dual multicore CPU’s running under a custom assembled operating system.  In addition, the server is powered by a linear power supply with choke regulation and a large capacitor bank utilizing the highest audiophile grade capacitors.  Various other music servers have similar high CPU processing capabilities.  

I know that music is played in real-time so there is not much time to do any large amounts of processing.  I also know that the data stream needs to free of jitter and all other forms of extra noise and distortion.   I believe that inputs and outputs are happening at the same time (I think).

I also know that Music Servers needs to support File Formats of FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128, DSD256 and DSD512 including Bit Depths of 16 and 24.  

Why does a music server require high processing power?   Does the list above of supported formats etc. require high processing power?  Assuming the Music Server is not a DAC, or a pre-amp, what is going on that requires this much processing power?   

What processing is going on in a music server?  How much processing power does a music server require?  

Am I missing something?   Thanks.   


hgeifman

Showing 4 responses by erik_squires

Here, take a look at the GNU gzip code:
https://savannah.gnu.org/projects/gzip/
That may help you understand. Every instruction takes CPU time.  The faster the CPU, the less time that instruction takes.

Best,
E
Roon may over-estimate it’s CPU requirements, mostly because they want to specify a CPU that is capable of doing all the upsampling you might ask it to do.

I’m running an 8 y.o. AMD A10 with no issues, but if I attempt high level DSD sampling I won’t be able to. It also sounds bad IMHO, so I lose nothing.

Also, Roon does all the EQ and upsampling in the server, so for every DAC end point you need to account for that.
But overall, I think it’s pretty light. I’ve also seen Logitech Media Server run on routers. That’s extremely lightweight.
Nearly all the CPU-intensive activities you cite (except for equalization) are the responsibility of the DAC.


Um, what?? Depends. In my world, a DAC takes S/PDIF in, or USB in, and produces an analog signal out.

While upsampling may be done by the DAC, file filtering and streaming does not. Many manufacturers may chose to implement their own upsampling algorithms before the DAC.

I’d also like to point out that the latency issue you raised does not affect sound quality but the time it takes after pressing Play until the music starts.

You misunderstand how I meant latency. I did not mean latency to user input, I meant the time between a CPU starting to process a sample and the time it is done with it, also known as CPU time. In order to do any processing a CPU takes time. We can estimate the worst case boundaries. For a 44.1kHz signal, the CPU must complete all of it’s work in 1/44,100 of a second, and have that sample of data ready for the DAC. If the signal is a higher sample, or is being upsampled, the CPU must complete it’s work in 1/88200 of a second for an 88.2kHz sample/upsample.

That’s’ about 11 microseconds. That’s the absolute maximum amount of time the CPU is allowed to handle this, or else the output will not keep up with the input. It must do at least the format conversion, EQ and upsampling. This is in addition to any network housekeeping, and UI interactions, and in addition to responding to the DAC clock saying "Gimme the next one!"

So, if a CPU core takes EXACTLY 11 microseconds to process this sample, it has ZERO time to process anything else. It has no time for network / house keeping, library management or UI responding. Any additional work would get queued and the audio output would stall.
If the CPU takes 11/2 usecs (microseconds), then the CPU core is only 50% utilized, and has processing power to schedule other work.

It’s important to stress that you must have input/output flow balance. That is, the rate of processing must be equal to or faster than the rate of output. You can’t make up for this with longer delay before a track starts, or larger buffer sizes, unless they are infinitely large. Imagine a CPU that takes 22 uSeconds to process an 88 kHz sample, and that the track is 5 minutes long. That means that the CPU won’t finish with the last sample for 10 minutes. You’d need a five minute buffer.  You press play, wait five minutes until the buffer fills, and then the music starts.  10 minutes after you hit play, the 5 minute track finally finishes playing.

Lastly, we should of course note that we are talking about a stereo sample. That is, at 88.2 kHz we must complete the decompression, EQ and upsampling of 2 samples every 11 microseconds.

So, if you ask me, how much CPU power do you need?

Well, you need enough CPU power to completely process every sample in real time, plus handle all the additional work.

We can calculate the maximum allowed compute time this way:

seconds = 1 / sample rate

So, you must use a CPU capable of meeting this, AND still have enough power to handle the other events that are happening in semi-real time. This calculation provides the minimum boundary.

Of course, for many reasons, having the CPU take 100% is not a good thing in an interactive or RTOS, so a target of 50% CPU utilization in an embedded CPU is reasonable. The amount of CPU power, if we think of this in terms of MIPS or SPECints or FLOPS is therefore dependent on the work that must be done in real time. You do less work, you get a cheaper, slower CPU.

I don’t really think it’s "a lot" either. I mean, we live at a time when 8 GByte/4 core 64-bit ARM based Raspberry PI sell for $120 and are fully capable of handling a desktop PC. It is important however to note that unless you know the architecture, clock speed, clocks and threads you don’t really know much about "how much" compute power you have. You can’t say "well I have 4 cores, so I can run grand theft auto." The’re all small, rectangular, and have a lot of pins on them. That doesn’t make them interchangeable in terms of the work they can do per unit time.

For instance, your average router has a pretty beefy, multicore Broadcomm system on a chip (including CPU) in it too, and all it’s doing is moving packets around. Is it "too much"? I doubt it.


Best,E

Hi Hgeifman,

The question about lots of CPU processing is not necessarily the same as having multiple cores.

Scheduling work to be done on a CPU is complicated, and desktop OS’s are not designed to be real-time(RTOS) or to guarantee latency between the time something is requested and it’s processed.

Having multiple cores facilitates this if you can guarantee 1 core for streaming, and leave the other tasks, like waiting for a user event or indexing music.

In terms of things that seem to actually consume processing power, upsampling and equalization are the two things which can consume processing power, and this varies on the type of upsampling, and the complexity of the eq.

Simple parametric EQ’s are usually benign, while room correction and convolution can really soak up the processing time. As well, upsampling to DSD seems to be (based on Roon) a big CPU consumer.

I use an AMD A10 processor, with 4/5 PEQ’s and filters, upsampling PCM by 2x and the CPU load is really light. It’s overkill, for an 8 year old CPU. Uses about 5% of 1 thread, but if I do upsampling to DSD it will use up nearly 75% or more of it.

Of course, MQA decoding will also add to this a little.

So, it depends, but CPU power is cheap these days, and easier to design a system with guaranteed latency if you have multiple cores than if you don’t.

Lastly, it's important to note that the CPU must do many things at once.  The reading and decompression of data from a file or external streaming source, as well as providing that data to the metronome of the output signal.  More cores helps facilitate this too, but the total processing power, that is, the amount of computation that must occur on the chip may not actually be all that much.

Best,


E