Skip to main content

10. - SIP SDP - Session Descripton Protocol

SIP SDP

SIP SDP stands for Session Description Protocol and is a protocol used to describe multimedia sessions for the purposes of session initiation, control, and management. In the context of SIP, SDP is typically used in the body of SIP messages to describe the characteristics of media streams being offered or negotiated in a SIP session. This includes information such as the type of media, codec, bandwidth, and IP addresses and ports for the media streams. SDP is an important part of the SIP protocol because it allows endpoints to negotiate and establish media sessions for audio, video, and other types of multimedia content.

There are a lot of fieldes availabble in SDP and some are mandatory (shown here in blue). New fields can be added as needed. if an agent doesn't understand a field it simply ignores it.

image.png

 

 

 

MIME

MIME stands for Multipurpose Internet Mail Extensions, which is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. In the context of SIP SDP (Session Description Protocol), MIME is used to specify the media type and format of the SDP body in the SIP message. The MIME type for SDP is "application/sdp". The SDP body contains information about the media streams to be used for the session, such as the codecs, transport protocols, and media formats to be used. The MIME type allows the receiving endpoint to determine how to handle the SDP body in the SIP message.

 

 

INVITE sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:[email protected]>
From: Alice <sip:[email protected]>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:[email protected]>
User-Agent: BriaX 3.5.5 build 81887
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY, MESSAGE, SUBSCRIBE, INFO
Allow-Events: presence, message-summary, refer, dialog
Session-Expires: 3600;refresher=uas
Content-Type: application/sdp
Content-Length: 283

v=0
o=- 123456 789012 IN IP4 pc33.atlanta.com
s=SIP Call
c=IN IP4 192.0.2.1
t=0 0
m=audio 49170 RTP/AVP 0 8 18 3 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
m=video 49172 RTP/AVP 31 34
a=rtpmap:31 H261/90000
a=rtpmap:34 H263/90000

 

In this example, the header includes the following fields:

Expand to view Header description
  • Via: Indicates the transport protocol, IP address, and port number used for the request. In this case, it is a UDP transport protocol and the IP address is pc33.atlanta.com.
  • Max-Forwards: Indicates the maximum number of times the request can be forwarded before it is discarded.
  • To: Contains the display name and SIP URI of the destination user.
  • From: Contains the display name and SIP URI of the source user, along with a unique identifier (tag) for the request.
  • Call-ID: Unique identifier for the call.
  • CSeq: Sequence number for the request.
  • Contact: Indicates the SIP URI that can be used to reach the sender of the request.
  • User-Agent: Indicates the user agent that generated the request.
  • Allow: Lists the SIP methods that the user agent is able to handle.
  • Allow-Events: Lists the types of events that the user agent can generate.
  • Session-Expires: Indicates the duration of the session in seconds and the refresher type (in this case, uas).


The SDP message in the body of the request describes the media capabilities of the sender and includes the following information:

  • v (Protocol version): Indicates the protocol version of the session description. In this case, it is 0.
  • o (Origin): Specifies the originator of the session and a unique identifier for the session. The fields include username, session identifier, session version number, and network type (IN for Internet) and address type (IP4 for IPv6).

    O=<username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

    The nettype and addrtype fields can take various values depending on the network protocol being used. Some of the options for the nettype field include:

    • IN: Internet
    • AT: Atmosphere
    • FI: FIP Snooping
    • Token-Ring: IBM Token Ring Network

    Some of the options for the addrtype field include:

    • IP4: IPv4 address
    • IP6: IPv6 address
    • MAC: MAC address
    • NSAP: Network Service Access Point
    • E.164: E.164 telephone number
  • s (Session name): Specifies a human-readable session name.
  • c (Connection information): Specifies the connection information for the session. In this case, it is the IP address 10.10.10.10. - This field specifies the network address type IP4 or IP6 and the connection address IP Address or hostname of the media stream, there are different types of connection information fields: 
    • c=IN IP4 192.168.1.100: This specifies that the media stream will use IPv4 and the IP address is 192.168.1.100.
    • c=IN IP6 ::1: This specifies that the media stream will use IPv6 and the IP address is ::1 (loopback address).
    • c=IN IP4 192.168.1.100/255: This specifies that the media stream will use IPv4 and the IP address is 192.168.1.100, with a subnet mask of 255.255.255.0.

  • t (Timing): Specifies the start and stop times of the session. In this case, it is set to 0, which means the session is not limited by time.
  • m (Media): Specifies the media type (audio), the port number (1234), and the transport protocol (RTP/AVP). There can be multiple media fields in an SDP message.
    • m=audio 49170 RTP/AVP 0 8: This specifies an audio stream on port 49170 using RTP/AVP protocol, with two codecs: PCMU (payload type 0) and PCMA (payload type 8).
    • m=video 51372 RTP/AVP 31 34: This specifies a video stream on port 51372 using RTP/AVP protocol, with two codecs: H.263 (payload type 31) and H.264 (payload type 34).
  • a (Attribute): Specifies additional attributes for the session. In this case, it specifies the RTP payload type for each codec (0 for PCMU, 8 for PCMA, and 101 for telephone-event), the codec names (PCMU, PCMA, and telephone-event), and the sampling rate (8000). The last line specifies the allowable range of event codes for the telephone-event codec. there can be multiple attribute fields in an SDP message
  • a=rtpmap:0 PCMU/8000: This specifies the codec for payload type 0 as PCMU with a sampling rate of 8000 Hz.
  • a=rtpmap:31 H263/90000: This specifies the codec for payload type 31 as H.263 with a clock rate of 90000 Hz.
  • a=sendrecv: This specifies that the media stream is bi-directional (send and receive).
  • a=rtcp-mux: This specifies that the RTP and RTCP packets are multiplexed on the same port.

 

RTPMAP

In the context of SDP, RTPMAP (RTP Mapping) is an attribute used to map a particular codec to an RTP payload type number. RTP is the Real-time Transport Protocol, which is used to transmit audio and video over IP networks. RTP uses payload types to identify the format of the data being transmitted.

The RTPMAP attribute provides a way for the sender to signal the receiver about the payload type numbers and the corresponding codecs being used. The attribute specifies the encoding name, the clock rate of the codec, and the number of audio channels being transmitted.

The syntax for the RTPMAP attribute is as follows:

a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding parameters>]

where:

  • <payload type> is an integer value representing the payload type number.
  • <encoding name> is a string that identifies the codec being used, such as "PCMU" for G.711 mu-law audio or "H264" for H.264 video.
  • <clock rate> is an integer value indicating the clock rate of the codec, in Hz.
  • <encoding parameters> is an optional string that specifies additional parameters for the codec, such as packetization mode or frame size.

Here is an example of an RTPMAP attribute for G.711 mu-law audio:

a=rtpmap:0 PCMU/8000

This specifies that payload type 0 is used for G.711 mu-law audio, with a clock rate of 8000 Hz.

Overall, RTPMAP is an important attribute in SDP as it allows the sender and receiver to negotiate and agree upon the codec to be used for transmitting the media.

All RTPMAP codecs


Payload type (PT)NameTypeNo. of channelsClock rate (Hz)[note 1]Frame size (byte)Default packet interval (ms)DescriptionReferences
0PCMUaudio18000any20ITU-T G.711 PCM μ-Law audio 64 kbit/sRFC 3551
1reserved (previously FS-1016 CELP)audio18000  reserved, previously FS-1016 CELP audio 4.8 kbit/sRFC 3551, previously RFC 1890
2reserved (previously G721 or G726-32)audio18000  reserved, previously ITU-T G.721 ADPCM audio 32 kbit/s or ITU-T G.726 audio 32 kbit/sRFC 3551, previously RFC 1890
3GSMaudio180002020European GSM Full Rate audio 13 kbit/s (GSM 06.10)RFC 3551
4G723audio180003030ITU-T G.723.1 audioRFC 3551
5DVI4audio18000any20IMA ADPCM audio 32 kbit/sRFC 3551
6DVI4audio116000any20IMA ADPCM audio 64 kbit/sRFC 3551
7LPCaudio18000any20Experimental Linear Predictive Coding audio 5.6 kbit/sRFC 3551
8PCMAaudio18000any20ITU-T G.711 PCM A-Law audio 64 kbit/sRFC 3551
9G722audio18000[note 2]any20ITU-T G.722 audio 64 kbit/sRFC 3551 - Page 14
10L16audio244100any20Linear PCM 16-bit Stereo audio 1411.2 kbit/s,[2][3][4] uncompressedRFC 3551, Page 27
11L16audio144100any20Linear PCM 16-bit audio 705.6 kbit/s, uncompressedRFC 3551, Page 27
12QCELPaudio180002020Qualcomm Code Excited Linear PredictionRFC 2658, RFC 3551
13CNaudio18000  Comfort noise. Payload type used with audio codecs that do not support comfort noise as part of the codec itself such as G.711, G.722.1, G.722, G.726, G.727, G.728, GSM 06.10, Siren, and RTAudio.RFC 3389
14MPAaudio1, 2900008–72 MPEG-1 or MPEG-2 audio onlyRFC 3551, RFC 2250
15G728audio180002.520ITU-T G.728 audio 16 kbit/sRFC 3551
16DVI4audio111025any20IMA ADPCM audio 44.1 kbit/sRFC 3551
17DVI4audio122050any20IMA ADPCM audio 88.2 kbit/sRFC 3551
18G729audio180001020ITU-T G.729 and G.729a audio 8 kbit/s; Annex B is implied unless the annexb=no parameter is usedRFC 3551, Page 20, RFC 3555, Page 15
19reserved (previously CN)audio    reserved, previously comfort noiseRFC 3551
25CELLBvideo 90000  Sun CellB video[5]RFC 2029
26JPEGvideo 90000  JPEG videoRFC 2435
28nvvideo 90000  Xerox PARC's Network Video (nv)[6][7]RFC 3551, Page 32
31H261video 90000  ITU-T H.261 videoRFC 4587
32MPVvideo 90000  MPEG-1 and MPEG-2 videoRFC 2250
33MP2Taudio/video 90000  MPEG-2 transport streamRFC 2250
34H263video 90000  H.263 video, first version (1996)RFC 3551, RFC 2190
72–76reserved     reserved because RTCP packet types 200–204 would otherwise be indistinguishable from RTP payload types 72–76 with the marker bit setRFC 3550, RFC 3551
77–95unassigned     note that RTCP packet type 207 (XR, Extended Reports) would be indistinguishable from RTP payload types 79 with the marker bit setRFC 3551, RFC 3611
dynamicH263-1998video 90000  H.263 video, second version (1998)RFC 3551, RFC 4629, RFC 2190
dynamicH263-2000video 90000  H.263 video, third version (2000)RFC 4629
dynamic (or profile)H264 AVCvideo 90000  H.264 video (MPEG-4 Part 10)RFC 6184, previously RFC 3984
dynamic (or profile)H264 SVCvideo 90000  H.264 videoRFC 6190
dynamic (or profile)H265video 90000  H.265 video (HEVC)RFC 7798
dynamic (or profile)theoravideo 90000  Theora videodraft-barbato-avt-rtp-theora
dynamiciLBCaudio1800020, 3020, 30Internet low Bitrate Codec 13.33 or 15.2 kbit/sRFC 3952
dynamicPCMA-WBaudio1160005 ITU-T G.711.1 A-lawRFC 5391
dynamicPCMU-WBaudio1160005 ITU-T G.711.1 μ-lawRFC 5391
dynamicG718audio 32000 (placeholder)20 ITU-T G.718draft-ietf-payload-rtp-g718
dynamicG719audio(various)4800020 ITU-T G.719RFC 5404
dynamicG7221audio 16000, 3200020 ITU-T G.722.1 and G.722.1 Annex CRFC 5577
dynamicG726-16audio18000any20ITU-T G.726 audio 16 kbit/sRFC 3551
dynamicG726-24audio18000any20ITU-T G.726 audio 24 kbit/sRFC 3551
dynamicG726-32audio18000any20ITU-T G.726 audio 32 kbit/sRFC 3551
dynamicG726-40audio18000any20ITU-T G.726 audio 40 kbit/sRFC 3551
dynamicG729Daudio180001020ITU-T G.729 Annex DRFC 3551
dynamicG729Eaudio180001020ITU-T G.729 Annex ERFC 3551
dynamicG7291audio 1600020 ITU-T G.729.1RFC 4749
dynamicGSM-EFRaudio180002020ITU-T GSM-EFR (GSM 06.60)RFC 3551
dynamicGSM-HR-08audio1800020 ITU-T GSM-HR (GSM 06.20)RFC 5993
dynamic (or profile)AMRaudio(various)800020 Adaptive Multi-Rate audioRFC 4867
dynamic (or profile)AMR-WBaudio(various)1600020 Adaptive Multi-Rate Wideband audio (ITU-T G.722.2)RFC 4867
dynamic (or profile)AMR-WB+audio1, 2 or omit7200013.3–40 Extended Adaptive Multi Rate – WideBand audioRFC 4352
dynamic (or profile)vorbisaudio(various)(various)  Vorbis audioRFC 5215
dynamic (or profile)opusaudio1, 248000[note 3]2.5–6020Opus audioRFC 7587
dynamic (or profile)speexaudio18000, 16000, 3200020 Speex audioRFC 5574
dynamicmpa-robustaudio1, 29000024–72 Loss-Tolerant MP3 audioRFC 5219 (previously RFC 3119)
dynamic (or profile)MP4A-LATMaudio 90000 or others  MPEG-4 Audio (includes AAC)RFC 6416 (previously RFC 3016)
dynamic (or profile)MP4V-ESvideo 90000 or others  MPEG-4 VisualRFC 6416 (previously RFC 3016)
dynamic (or profile)mpeg4-genericaudio/video 90000 or other  MPEG-4 Elementary StreamsRFC 3640
dynamicVP8video 90000  VP8 videoRFC 7741
dynamicVP9video 90000  VP9 videodraft-ietf-payload-vp9
dynamicL8audio(various)(various)any20Linear PCM 8-bit audio with 128 offsetRFC 3551 Section 4.5.10 and Table 5
dynamicDAT12audio(various)(various)any20 (by analogy with L16)IEC 61119 12-bit nonlinear audioRFC 3190 Section 3
dynamicL16audio(various)(various)any20Linear PCM 16-bit audioRFC 3551 Section 4.5.11, RFC 2586
dynamicL20audio(various)(various)any20 (by analogy with L16)Linear PCM 20-bit audioRFC 3190 Section 4
dynamicL24audio(various)(various)any20 (by analogy with L16)Linear PCM 24-bit audioRFC 3190 Section 4
dynamicrawvideo 90000  Uncompressed VideoRFC 4175
dynamicac3audio(various)32000, 44100, 48000  Dolby AC-3 audioRFC 4184
dynamiceac3audio(various)32000, 44100, 48000  Enhanced AC-3 audioRFC 4598
dynamict140text 1000  Text over IPRFC 4103
dynamicEVRC
EVRC0
EVRC1
audio 8000  EVRC audioRFC 4788
dynamicEVRCB
EVRCB0
EVRCB1
audio 8000  EVRC-B audioRFC 4788
dynamicEVRCWB
EVRCWB0
EVRCWB1
audio 16000  EVRC-WB audioRFC 5188
dynamicjpeg2000video 90000  JPEG 2000 videoRFC 5371
dynamicUEMCLIPaudio 8000, 16000  UEMCLIP audioRFC 5686
dynamicATRAC3audio 44100  ATRAC3 audioRFC 5584
dynamicATRAC-Xaudio 44100, 48000  ATRAC3+ audioRFC 5584
dynamicATRAC-ADVANCED-LOSSLESSaudio (various)  ATRAC Advanced Lossless audioRFC 5584
dynamicDVvideo 90000  DV videoRFC 6469 (previously RFC 3189)
dynamicBT656video    ITU-R BT.656 videoRFC 3555
dynamicBMPEGvideo    Bundled MPEG-2 videoRFC 2343
dynamicSMPTE292Mvideo    SMPTE 292M videoRFC 3497
dynamicREDaudio    Redundant Audio DataRFC 2198
dynamicVDVIaudio    Variable-rate DVI4 audioRFC 3551
dynamicMP1Svideo    MPEG-1 Systems Streams videoRFC 2250
dynamicMP2Pvideo    MPEG-2 Program Streams videoRFC 2250
dynamictoneaudio 8000 (default)  toneRFC 4733
dynamictelephone-eventaudio 8000 (default)  DTMF toneRFC 4733
dynamicaptxaudio2 – 6(equal to sampling rate)4000 ÷ sample rate4[note 4]aptX audioRFC 7310
dynamicjxsvvideo 90000  JPEG XS videoRFC 9134

 

FMTP

The "fmtp" (format parameters) attribute in SDP (Session Description Protocol) is used to describe the format parameters of a media stream, specifically for codecs that support dynamic negotiation of parameters. It specifies the media format parameters as a set of parameter name-value pairs, separated by semicolons.

The format of the "fmtp" attribute is as follows:

a=fmtp:<format> <parameter_name>=<value>;<parameter_name>=<value>;...
  • <format>: specifies the media format to which the parameters apply.
  • <parameter_name>: specifies the name of the parameter.
  • <value>: specifies the value of the parameter.

For example, the "fmtp" attribute for the H.264 video codec may look like:

a=fmtp:120 profile-level-id=42801E; packetization-mode=1; sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==

In this example, the <format> value is 120 (which is the format number for H.264). The <parameter_name> values include "profile-level-id", "packetization-mode", and "sprop-parameter-sets", and their corresponding <value>s are "42801E", "1", and "Z0IACpZTBYmI,aMljiA==", respectively. These parameters provide information such as the profile and level of the codec, the packetization mode, and the parameter sets required for decoding the video stream.

The "fmtp" attribute is typically used in conjunction with the "rtpmap" attribute to describe media formats that use RTP (Real-time Transport Protocol) for transmission.

 

This is a SIP response to an INVITE request shown above. It has a status code of 200 OK, indicating that the request was successful. The Via header indicates the path that the request followed. The To header contains the tag that identifies the current dialog. The From header contains the tag that identifies the previous dialog. The Call-ID header is used to identify the call. The CSeq header contains the sequence number and method of the request. The Contact header specifies the address where the response should be sent. The Content-Type header indicates that the body of the message is in SDP format. The SDP message body includes connection information, timing information, and media information for both audio and video. In this response, some codecs have been removed from both audio and video streams.

SIP/2.0 200 OK
Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds
To: Bob <sip:[email protected]>;tag=2482893830n
From: Alice <sip:[email protected]>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:[email protected]:5060>
Content-Type: application/sdp
Content-Length: 204

v=0
o=- 123456 789012 IN IP4 192.0.2.4
s=SIP Call
c=IN IP4 192.0.2.1
t=0 0
m=audio 49170 RTP/AVP 0 18
a=rtpmap:0 PCMU/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
m=video 49172 RTP/AVP 31
a=rtpmap:31 H261/90000
  • v=0: The protocol version being used is 0.

  • o=- 123456 789012 IN IP4 pc33.atlanta.com: This is the origin field. The - indicates that the session is not valid for further communication. 123456 is the session ID, which is chosen arbitrarily. 789012 is the session version number. IN stands for "Internet", and IP4 pc33.atlanta.com is the address of the host that originated the session.

  • s=SIP Call: The session name, which is an optional field. In this case, it is "SIP Call".

  • c=IN IP4 192.0.2.1: The connection information for the session. This field specifies the network type, address type, and address information. In this case, it indicates that the session is using the Internet network, IPv4 address, and the address is 192.0.2.1.

  • t=0 0: The timing information for the session. The first 0 indicates the start time of the session, which is "now". The second 0 indicates the session will not terminate automatically.

  • m=audio 49170 RTP/AVP 0 18: This is the media description for the audio stream. audio indicates that this is an audio stream. 49170 is the port number being used for the RTP stream. RTP/AVP specifies the protocol being used (RTP) and the payload format for the audio data (AVP). 0 and 18 are the RTP payload types being used for this stream.

  • a=rtpmap:0 PCMU/8000: This attribute maps the payload type 0 to the audio codec PCMU and indicates that the audio is sampled at a rate of 8000 Hz.

  • a=rtpmap:18 G729/8000: This attribute maps the payload type 18 to the audio codec G.729 and indicates that the audio is sampled at a rate of 8000 Hz.

  • a=fmtp:18 annexb=no: This is an optional attribute that provides additional parameters for the G.729 codec. In this case, it specifies that the codec is not using the Annex B format.

  • m=video 49172 RTP/AVP 31: This is the media description for the video stream. video indicates that this is a video stream. 49172 is the port number being used for the RTP stream. RTP/AVP specifies the protocol being used (RTP) and the payload format for the video data (AVP). 31 is the RTP payload type being used for this stream.

  • a=rtpmap:31 H261/90000: This attribute maps the payload type 31 to the video codec H.261 and indicates that the video is sampled at a rate of 90000 Hz.