Challenges of Network Slicing

Neil Davies and Peter Thompson, Predictable Network Solutions

There are a lot of technology buzzwords about ‘network slicing’, but what do they really mean? In particular, what does a good ‘network slicing’ solution look like? What do we want from it? What capabilities are needed to deliver these requirements?

Properties of slicing

Isolation

First of all, what we want is isolation of one set of users/services/applications from another. This isolation has two parts: isolation of connectivity; and isolation of performance. Let’s take those one at a time.

Isolation of connectivity means that, by default, connectivity within a slice is managed within it, but connectivity between slices need appropriate intervention from outside both slices. This is simply expanding the old idea of ‘closed user groups’.

Isolation of performance is a bit more complex; short of TDM (Time Division Multiplexing), it’s impossible to completely prevent one packet flow from being affected by others sharing the same resources, so the question of how to specify how much it can be affected is critical. Simply partitioning bandwidth is not enough, because ‘bandwidth’ is an average measure over many packet times, and applications are sensitive in different degrees to packet delays and drops over much shorter timescales. A good slicing solution must therefore allow requirements over different timescales to be expressed and enforced (or refused, if they’re infeasible). Also, performance is an end-to-end issue, so there needs to be a way to manage the impact of resource sharing all along the path, even when it crosses management domains.

Assurance

Secondly, there needs to be some means to assure the isolation, both of connectivity and performance, otherwise it can’t be trusted. What this needs to be will depend on the regulatory framework and the level of trust required - a critical medical service will have different requirements from the smart home sensor array, for example, which will be different again from watching cat videos.

Scalability

Thirdly, all of this needs to be deployable at minimal effort, otherwise the costs will outweigh the benefits. Requirements need to be expressed at a high level, with lower-level configurations automatically generated to deliver the isolation requested. This will create new opportunities for dynamically-provisioned services that are simply infeasible today.

Reusability

Fourthly, a good slicing solution needs to cover the full range of use cases to harvest economies of scale. These include: incumbent telco offering wholesale services; multiple wholesalers sharing underlying infrastructure; and multiple infrastructure owners with irrevocable rights to use. Aggregators, for example multinationals, could than buy ‘slices’ from multiple sources and stitch them together (as they do today with MPLS or carrier Ethernet solutions).

Capabilities needed

What is needed to deliver these requirements?

With regard to connectivity isolation, a scalable slicing solution will require a rich set of association management capabilities including the capacity to delegate management of subsections of the slicing hierarchy.

In terms of performance isolation, the ability to ‘trade’ requirements of different flows on multiple timescales and to handle the impact of overbooking of resources, both within and between slices, will be critical for an efficient solution. Reusability demands that this operates over the full range of potential bearers (e.g. home legacy WiFi up to 100G+ fibres) and the full spectrum of application requirements (e.g. from once-a-week IoT reporting to immersive VR). Allowing overbooking is essential for economic operation, but managing the risk this creates is also essential for delivering reliable services. This management needs to be dynamic, automatic, and scalable to avoid becoming a bottleneck. The assurance mechanism needs to monitor how often overbooking risks mature, both to inform the ‘users’ and to guide capacity planning. To manage performance along a path will require a mechanism to signal requirements and hazards so that these can be negotiated, either via a central orchestration mechanism or by distributed ‘choreography’

To assure connectivity, policies need to be formulated and captured in a way that they can be automatically validated, which can then help manage the inevitable mis-configurations and operational failures.

Assuring performance requires a quantitative definition of performance requirements and mechanisms to measure that the requirements of individual flows and flow-aggregates are being delivered.

For scalability it is essential to automate the provisioning of new slices (minutes not weeks), and delegate management within a slice to automated and policy driven mechanisms. Establishing connections between slices might necessitate human signoff. For performance management to be scalable, performance requirements (not just bandwidth - or any other average measure) have to be “aggregatable” in order to contain the complexity, and performance assurance needs to be low-impact.

To maximize the value of all these capabilities and exploit the potential for service/application innovation, there should be a low barrier to entry. The goal should be to deliver a set of capabilities that applications can re-use rather than re-inventing them at another level. For example, many applications introduce functionality then add the ability to create closed user groups (WhatsApp etc.). It should be attractive to instead start with a network slice (which gives the closed user group and performance isolation) and add the needed functionality within it. This is an alternative view of the operator value issue - a suitable network slice environment can help create value in aggregation across different operational domains. For example, an operator of a domain-specific slice (e.g. for medical or financial applications) can add value both by aggregation of the network slice capabilities across multiple providers (e.g. the connectivity and performance isolation) and by implementing particular policies that are appropriate to that domain (e.g. centralized auditing, or rigorous policies for connection to the slice).

This may sound like a pie-in-the-sky wish list, but there is good news: all the capabilities required have been demonstrated, for example in the EU PRISTINE project (no. 619305, http://ict-pristine.eu/) using the clean-slate RINA framework (Recursive InterNetwork Architecture). RINA delivers connectivity isolation by default, and PRISTINE has shown how this can be ‘orchestrated’ by an automatic network management system (see the article “Progressive Network Transformation with RINA” in this newsletter), and how performance isolation can be engineered within this framework [1]. Details of this were presented at the SDN and Openflow World Congress 2016 in Den Haag [2].

Assuring QoS Guarantees for Heterogeneous Services in RINA Networks with ΔQ
Sergio Leon Gaixas, Jordi Perelló, Davide Careglio, Eduard Grasa, Miquel Tarzan, Neil Davies, Peter Thompson. NetCloud 2016
PRISTINE Workshop, presentations available from SlideShare via http://ict-pristine.eu/?p=936

Neil Davies is an expert in resolving the practical and theoretical challenges of large scale distributed and high-performance computing. He is a computer scientist, mathematician and hands-on software developer who builds both rigorously engineered working systems and scalable demonstrators of new computing and networking concepts. His interests center around scalability effects in large distributed systems, their operational quality, and how to manage their degradation gracefully under saturation and in adverse operational conditions. This has lead to recent work with Ofcom on scalability and traffic management in national infrastructures.

Throughout his 20-year career at the University of Bristol he was involved with early developments in networking, its protocols and their implementations. During this time he collaborated with organizations such as NATS, Nuclear Electric, HSE, ST Microelectronics and CERN on issues relating to scalable performance and operational safety. He was also technical lead on several large EU Framework collaborations relating to high performance switching. Mentoring PhD candidates is a particular interest; Neil has worked with CERN students on the performance aspects of data acquisition for the ATLAS experiment, and has ongoing collaborative relationships with other institutions.

Peter Thompson became Chief Technical Officer of Predictable Network Solutions in 2012 after several years as Chief Scientist of GoS Networks (formerly U4EA Technologies). Prior to that he was CEO and one of the founders (together with Neil Davies) of Degree2 Innovations, a company established to commercialize advanced research into network QoS/QoE, undertaken during four years that he was a Senior Research Fellow at the Partnership in Advanced Computing Technology in Bristol, England. Previously he spent eleven years at STMicroelectronics (formerly INMOS), where one of his numerous patents for parallel computing and communications received a corporate World-wide Technical Achievement Award. For five years he was the Subject Editor for VLSI and Architectures of the journal Microprocessors and Microsystems, published by Elsevier. He has degrees in mathematics and physics from the Universities of Warwick and Cambridge, and spent five years doing research in general relativity and quantum theory at the University of Oxford.

Editor:

Chris Hrivnak is a senior member of the Institute of Electrical and Electronic Engineers (IEEE) and IEEE Photonics Society. He is also a member of the IEEE Cloud Computing Community, IEEE Life Sciences Community, IEEE Smart Grid Community, the IEEE Software Defined Networks (SDN) Community, IEEE Internet of Things (IoT) Technical Community and the IEEE Internet Technology Policy Community.