SIP provides a shared language – or signaling protocol – for devices, servers and apps. This language is used – by software – for setting up, managing and ending communication sessions – such as phone calls – across any IP network such as the Internet.
SIP does not define the standard for the actual audio or video to be sent between the endpoints. The “packaging” of audio and video is defined by other standards such as g.711 and g.729 – offering different levels of compression and quality.
SIP only provides a method for two independent endpoints to establish contact, and agree on what standard to use for the media (the voice, video etc.) and then give each other the addresses (IP addresses and IP ports) for where the other party should start sending its media.
Connect any endpoints
The SIP standard allows you to connect together any SIP-enabled software or hardware that you have. This can include desktop phones, apps in mobile phones or business switches or specialized server software – allowing calls and other communication sessions to flow freely.
It also allows you to connect these endpoints to third party service providers across the Internet, for using services such as voice response or onward connection to the old telephone network (PSTN).
Firewalls and SIP
The promise of a standard language for communication between any devices on the Internet may sound great.
Reality however is a bit less rosy.
Firewalls are designed to prevent any incoming traffic from the Internet from reaching any computers and devices, unless the type of traffic and destination is explicitly allowed. SIP traffic is usually not allowed unless you change the configuration of the Firewall which is usually nothing for the weakhearted.
Local data networks – behind Firewalls – usually have local IP-addresses, and jointly share one or few public IP addresses that actually work on the Internet. The NAT (Network Address Translation) is a function in the Firewall or Internet router that maps the public addresses with the local ones. This is an additional headache for SIP traffic – as SIP endpoints need to figure out what their public address actually is, before asking another remote endpoint to start sending media to them.
The most common approach for getting around this problem (NAT traversal) is to use a STUN server available at the public Internet – often made available by the service provider you may use. The STUN server allows the SIP device to ask and find out what public IP address it has been given by the Firewall, so that it can share this address with other devices that may want to send calls to it later. STUN also helps keep a “pin hole” open in the Firewall, for allowing incoming SIP traffic from remote devices.
Firewalls also commonly have ALG (Application Level Gateway) functionality built in, that identifies the type of data being sent – such as SIP – and tries to do helpful changes in the actual data. This has turned out to be not-so-helpful in many cases – as it magically changes data which messes up the signaling. ALGs are usually proprietary software that are poorly documented.
With the years SIP has grown into a rich protocol with a lot of nitty gritty options. All vendors of SIP hardware and software usually have their own variants and preferences – and bugs – when they use SIP. This means that the ability to connect two devices from different vendors together may not always be as straight forward as you would like.
Difficult to troubleshoot
Connecting SIP software and devices from major providers together usually works just fine.
But when it doesn’t – it is usually a beast to figure out why.
Each component in the line of communication; the two endpoints, as well as the firewall and router, with NAT, ALG in-between – may work perfectly fine by themselves – but may jointly still end up with a combination of subtle differences that cause issues such as one-way audio, inability to set up calls etc.
This usually requires a bit of Sherlock Holmes work to be worked out.
The main components of SIP
When talking about SIP, it is important to know some of the basic terms. Here is a quick summary of the lingo you need to know.
A User agent is an endpoint for communication. This can be a desktop phone, app or similar.
A proxy server is a software that can sit between User agents. A proxy can for example add the function of load balancing in environments with many calls – such as a call center – so that incoming traffic is spread across servers etc. It can also fork calls, so that they ring on multiple endpoints at the same time.
A Back-to-back-user-agent is a software function that SIP traffic can flow through, when connecting sessions between two end points (User agents). The B2BUA can implement various types of business logic to be applied to the SIP traffic that flows through, such as start centralized recording, or divert sessions elsewhere.
A SIP registrar is a function that allows SIP enabled devices to register and say “hey I’m available for sessions at this IP address”. Other devices can then use the address and status info gathered by the SIP registrar to connect to the registered devices when there is a call or session to be started.
A great standard
While we have mentioned several of the challenges and some of the complexity of SIP in this article, it is important to note that SIP is a highly reliable standard, which works great for handling millions of calls and communication sessions every day on this planet.
It is a mature communication standard that generally works right out of the box.
It allows you to connect things you have – to a plethora of services from different vendors worldwide – in order to get a high quality, low cost service for your business.