Ethernet and streaming


After reading many interesting posts on Ethernet cables and switches I thought it would be good to describe how Ethernet and other networking protocols are used in streaming services. This post does not cover the use of USB, AES or Coax cables. All the words used are factual and taken from other authors. Please feel free to correct any factual inaccuracies. The following topics are covered:

  • Streaming Architecture

  • Media Server

  • Media Renderer

  • Ethernet Data Transmission

  • Ethernet Cable, Noise and Differential Signaling

  • Ethernet Clock

  • Ethernet Switch

  • Summary

128x128welcher

Streaming Architecture:

Network audio is based upon error free bulk data transfer. It incorporates the same/similar technology that are used for rendering the web page you are currently reading. Qobuz, Tidal, Roon and UPnP/DLNA have similar architectures. They all use Ethernet, IP and TCP protocols in the same manner. They differ in their use of the upper level protocols. Since Qobuz, Tidal and Roon are closed architectures we will use UPnP/DLNA as an example.

Basically a media server provides media discovery and media transportation to a media render which converts a binary audio file to analog audio. The following diagram depicts the UpnP / DLNA architecture and protocol suite


 


 


 


 


 


 


 


 

Media Server:

In this example we will transport Jennifer Warnes First We take Manhattan from the media server to the media render. The file is in 96/24 PCM uncompressed format and approximately 133 mega bytes in length. For simplicity we will ignore the upper level protocols and concentrate on Ethernet, IP and TCP.

The media server uses the TCP process to divide the file into TCP packets. The TCP layer forwards the packets to the IP process which encapsulates a TCP packet into an IP packet. The IP process forwards the IP packet to the Ethernet process which encapsulate the IP packet into an Ethernet frame and transmit the frame on an Ethernet cable.

It is common to limit a TCP/IP packet size so that it fits in one Ethernet frame. Doing this will require more than 91,000 frames to transmit the file.

Media Render:

We will now examine the render's processing in a little more detail. The renderer contains a CPU, system memory (RAM) and a system bus. It will also contain a CPU clock which is used to synchronize operations (executing CPU instructions, moving data to and from memory, moving data onto a system bus).

Each process element (TCP, IP and Ethernet) have a designated space in system memory to perform their work. When data is transferred from one processing element to another the data will be copied/moved from one location in system memory to another location. We will refer to this movement of data between the process elements as streaming.

The Ethernet process creates an Ethernet frame from the electrical signals on the cable. Once a frame has been identified the checksum is then verified. If any of the bits are received in error the entire frame is discarded. The contents of the Ethernet frame are then streamed to the IP process.

The IP process extracts an IP packet from the Ethernet frame. If the packet is valid and the destination address matches the renderer address a TCP packet is extracted from the IP packet and streamed to the TCP process.

The TCP process verifies the packet checksum to ensure that data has not been corrupted. If the checksum is not verified the packet is discarded. The packet sequence number is compared to the next expected sequence number. If the numbers match the contents of the packet are streamed to the next process in the process chain and the sequence number is acknowledged to the media server. If the received sequence number is less than the expected sequence number the packet is discarded. If the received sequence number is greater than the expected sequence number the packet can be discarded or saved for later processing. The media server will automatically retransmit packets which are not acknowledged within a defined time limit.

After enough data has been accumulated in system memory the process of generating an analog signal from the digital data may begin.