Optimising our world with mathematical models - with Jane Hillston

TLDR;

This presentation explores the use of mathematical models to understand and optimize resource allocation in various systems, focusing on queuing theory and its evolution to address the complexities of modern systems like AI. It highlights the tension between user expectations for timely and high-quality access to resources and system operators' need for cost-effective and sustainable resource utilization. The talk traces the history of queuing theory from Erlang's work on telephone networks to its applications in computer systems, biological systems, and AI, emphasizing the importance of considering resource constraints in system design and performance evaluation.

Queuing theory helps understand resource allocation in various systems.
Balancing user expectations and system operator needs is crucial.
Mathematical models are essential for optimizing resource use in AI and other complex systems.

Introduction: The Importance of Mathematical Models in Understanding Systems [0:17]

The speaker introduces the concept of using mathematical models to understand systems, particularly focusing on resources and the challenges of limited resource availability. Queues are presented as a central element in managing limited resources, a concept humorously associated with British culture but recognized as a universal phenomenon. The fundamental question posed is whether queues can be eliminated, suggesting that infinite resources would be the solution, though practically unattainable.

Resources and System Behavior: From Airport Security to Train Networks [2:39]

The discussion shifts to how resources are key to understanding system behavior, citing examples like airport security and train networks. Resources can be obvious, like counters at a post office, or less apparent, like sections of railway track. Queues form when resources are not immediately available, leading to delays. The goal is to tune systems for efficiency by minimizing queues, which represent nonproductive waiting time. This type of modeling is known as performance modeling, aimed at understanding system performance as constrained by resources.

Perspectives on Resource Use: Users vs. System Operators [4:37]

The speaker discusses the different perspectives on what constitutes reasonable resource use, considering fairness, equity, affordability, and sustainability. Users typically want timely access, good quality, and availability, focusing on response time and throughput while avoiding wasted effort or blocking probability. System operators, on the other hand, aim for cost-effective utilization, avoiding both low utilization (idle resources) and excessively high loads that could lead to system fragility. The conflicting requirements between users and system operators highlight the role of performance modelers in finding a balance through "what-if" scenarios.

The Origins of Queuing Theory: Agner Erlang and Telephone Networks [9:23]

The presentation traces the origins of queuing theory to Agner Erlang, a Danish mathematician and engineer in the early 20th century. Erlang was tasked with determining the number of wires needed in telephone exchanges to ensure call connectivity. He developed a formula to calculate the probability of dropped calls due to insufficient connections. The Erlang formula considers the workload (E), which is the product of the arrival rate of calls and the average call duration, and the number of servers (M), representing the available connections.

Erlang's Loss Formula: Understanding Call Blocking Probability [10:50]

The speaker explains Erlang's formula, which calculates the probability of a call being blocked due to a lack of available connections in a telephone exchange. The formula uses the workload parameter E (arrival rate multiplied by average call duration) and the number of servers M (available connections) to determine the blocking probability. This formula is still used today in call centers to determine staffing needs based on call volume and duration.

Queuing Theory Abstraction: Requests, Jobs, and Service Centers [13:33]

Erlang's work led to the development of queuing theory, which abstracts systems into queues in a technical sense. Requests arrive at the system, generating jobs that wait in a queue to be processed by a server. Once the job is processed, it leaves the system. By observing the system over a period of time (T), one can measure the number of arrivals (A), completions (C), and busy time to understand the system's behavior.

Basic Observations and Parameters: Arrival Rate, Throughput, and Utilization [14:40]

By observing a system, key parameters can be determined, including the arrival rate (number of arrivals per unit time), throughput (number of completions per unit time), utilization (proportion of busy time), and mean service time (average time each job spends in service). These parameters provide a basic understanding of the system's performance and can be used to feed into mathematical models.

Formalizing the Model: Service Centers and Queuing Disciplines [17:21]

The presentation formalizes the queuing model, representing a system as a service center with a server and a buffer (queue). This abstraction allows for variations, such as multiple servers with a shared queue. Different service disciplines, like first-come-first-serve, priority order, or shared service, can be applied, leading to rich mathematical models.

Little's Law: Relating Number in System, Arrival Rate, and Waiting Time [19:47]

Little's Law is introduced, which relates the average number of customers in a system (L) to the arrival rate (λ) and the average time a customer spends in the system (W). This simple formula, L = λW, is intuitive for deterministic systems and was proven by John Little to work for any type of queue, regardless of service discipline or time distribution.

The Role of Probabilities: Capturing Randomness and Variability [22:12]

The use of probabilities is explained as a way to model the random choices and variability in systems. Probabilities capture individual behaviors within a large population, such as customers changing their minds or rejoining queues. Probability distributions, like the exponential distribution, are used to represent the varying service times required by different customers.

The Impact of Randomness: Deterministic vs. Random Systems [24:01]

The presentation illustrates the impact of randomness by comparing deterministic and random systems. In a deterministic system with regular arrivals and fixed service times, there may be no queue. However, in a random system with bursty arrivals and variable service times, queues can build up even with the same average arrival rate and utilization. Randomness and variability significantly affect the experience of these systems.

Modeling Queues with Markov Chains: States and Transitions [27:45]

Queues are modeled using Markov chains, where the state of the system is represented by the number of people waiting. Transitions between states occur due to arrivals and departures. This representation can be translated into a matrix, allowing for the calculation of key metrics like average queue length and response time.

Networks of Queues: Modeling Multiple Resources [28:40]

Most systems involve multiple resources, which can be modeled as networks of queues. Each resource is represented as a separate queue, and the interconnections between them are mapped using routing probabilities. This approach allows for the compositional construction and solution of complex systems, leveraging the elegant mathematics of queuing theory.

Applications of Queuing Theory: From Job Shops to Computer Systems [30:42]

Queuing theory has been widely applied in various domains. In the early 20th century, it was used to optimize job shops and assembly lines. Around 1960, it was recognized as a valuable tool for modeling computer systems, helping to determine buffer sizes and manage data processing.

Limitations of Classic Queuing Theory: The Rise of Interconnected Systems [32:09]

The rise of the internet and interconnected systems in the 1980s revealed the limitations of classic queuing theory. Traditional queuing networks assume sequential processing, where jobs use one resource at a time and cannot be merged or split. They also assume independent arrivals and that customers do not change their behavior based on what they observe. These assumptions do not hold in modern systems where concurrency and feedback loops are common.

New Abstractions: Process Algebras and Concurrency [34:29]

To address the limitations of classic queuing theory, new abstractions were needed to model concurrent systems. Process algebras emerged as a way to describe systems with multiple things happening at once. These algebras use operators to combine small processes according to certain rules, allowing for reasoning about the behavior of the composite system.

Performance Evaluation Process Algebra (PEPA): A Programming Language for Mathematical Models [36:44]

Performance Evaluation Process Algebra (PEPA) is introduced as a programming language that compiles into mathematical models. PEPA focuses on the components within a system, such as customers and resources, and allows for more flexible interactions than queuing networks. The language compiles down to Markov chains, enabling the analysis of system performance.

Building Models with PEPA: Components and Interactions [39:06]

The presentation demonstrates how to build models using PEPA, showing examples of a simple queue and an application interacting with a web server. PEPA allows for the creation of sub-models that can be combined to form larger models of the whole system. The interactions between components can be synchronized, allowing for more sophisticated behaviors.

Trade-offs: Expressiveness vs. Decomposability [41:26]

While PEPA offers greater expressiveness than queuing networks, it comes at the cost of decomposability. Unlike queuing networks, PEPA models must be analyzed as one large mathematical system. However, software tools can automate the generation of the mathematics and the calculation of performance metrics.

Real-World Applications of PEPA: From Canals to Cell Biology [42:57]

PEPA and similar techniques have been applied to a wide range of real-world systems, including optimizing freight movement in Belgian canals, tuning robotic work cells, managing cellular telephone networks, and ensuring the performance of onboard computer systems in buses. Surprisingly, these techniques have also found applications in biological systems, such as modeling cell cycles and invasive species.

Modeling Cell Biology with Bio-PEPA: Ordinary Differential Equations [47:01]

The presentation delves into an example of using Bio-PEPA to model the cell cycle, focusing on the role of cyclin in regulating cell division. Bio-PEPA compiles to ordinary differential equations, which are used to model concentrations of proteins rather than counting individual molecules. Simulations of these equations can reveal how changes in protein concentrations affect the cell cycle.

Future Challenges: AI and Resource Consumption [50:47]

The speaker identifies AI as a significant future challenge, particularly concerning the resource consumption of large language models. These models are very expensive to run in terms of cost, energy, and computing equipment. The focus on capability in the AI community has led to a neglect of resource considerations, resulting in ever-larger models with high energy consumption.

The Need for Performance Modeling in AI: Beyond User-Defined Performance [52:48]

The speaker argues for the need for performance modeling in AI to address the growing resource consumption of large language models. Traditional performance metrics like response time and utilization may not be suitable for evaluating the training of these models. New metrics and approaches are needed to understand and optimize the resource use of AI systems.

Data Collection and Analysis: Understanding Resource Use in AI [57:19]

Efforts are underway to collect data on the resource consumption of AI systems. The site LLM stats, maintained by Berkeley, catalogs these systems to understand their use of resources better. Analysis of this data reveals trade-offs between performance and cost, highlighting the need for more efficient AI models.

DeepSeek: A Case Study in Efficient AI [58:58]

DeepSeek is presented as a case study in efficient AI, achieving better performance with fewer resources. This was driven by the necessity of limited access to computing resources, leading to better engineering and the use of concurrency.

Conclusion: The Importance of Considering Resources [1:00:11]

The presentation concludes by emphasizing the major role of resources and contention for them in many systems. Mathematical models can help understand these resources, and failure to consider them can lead to unrealistic expectations and unlimited costs. Even simple models like Erlang's formula can provide insight and help use resources more effectively.

Watch the Video

Date: 4/29/2025 Source: www.youtube.com