Neues von

  1. The prime numbers are those integers that can only be divided by themselves and 1. The first seven are

    $2, 3, 5, 7, 11, 13, 17.$

    Sieve of Eratosthenes illustration

    This is an illustration of the sieve of Eratosthenes, which is designed to catch prime numbers. You can find out more about the sieve here. Adapted from a figure by SKopp, CC BY-SA 3.0.

    Every other positive integer can be written as a product of prime numbers in a unique way — for example $30=2\times 3\times 5 $ — so the prime numbers are like basic building blocks that other integers can be constructed from. This is why people find them interesting.

    We have known for thousands of years that there are infinitely many prime numbers (see here for a proof), but there isn't a simple formula which tells us what they all are. Powerful computer algorithms have enabled us to find larger and larger primes, but we will never be able to write down all of them.

    The prime number theorem tells us something about how the prime numbers are distributed among the other integers. It attempts to answer the question "given a positive integer $n$, how many integers up to and including $n$ are prime numbers"?

    The prime number theorem doesn’t answer this question precisely, but instead gives an approximation. Loosely speaking, it says that for large integers $n$, the expression

      \[ \frac{n}{\ln {(n)}} \]    

    is a good estimate for the number of primes up to and including $n$, and that the estimate gets better as $n$ gets larger. Here $\ln {(n)}$ is the natural logarithm of $n$, which you can find on your calculator.

    As an example, let's take $n=1,000$. The actual number of primes up to and including $1,000$ (which you can look up in this list) is $168$. Our estimate is

      \[ \frac{1,000}{\ln {(1,000)}} = \frac{1,000}{6.9}\approx 145, \]    

    which is a pretty good approximation.

    However, to be precise about what the prime number theorem tells us, we need to say what we mean by "a good estimate". The prime number theorem does not say that for large values of $n$ the difference between the true value and our approximation  is close to $0.$ Instead, it tells us something about the question "what’s the approximation as a percentage of the true value"?

    To go back to our example of $n=1,000$, the true value was $168$ and the approximation was $145$. Therefore, the approximation constitutes a proportion of

      \[ \frac{145}{168}=0.86, \]    

    of the true result, which translates to 86%. Not bad.

    For $n=100,000$ the actual number of primes up to and including $100,000$ is $9,592$, so that’s the true value, and the estimate was

      \[ \frac{100,000}{\ln {(100,000)}}=\frac{100,000}{11.5}\approx 8,686. \]    

    Therefore in this case the estimate constitutes a proportion of

      \[ \frac{8,686}{9,592}=0.9, \]    

    of the true value. This translates to 90%, so here the estimate is better than for $n=1,000$.

    Generally, the prime number theorem tells us that for large $n$ the approximation $n/\log {(n)}$ is nearly 100% of the true value. In fact you can get it to be as close to 100% as you like by choosing a large enough $n.$

    A tree diagram

    The red curve shows the number of primes up to and including n, where n is measured on the horizontal axis. The blue curve gives the value of n/ln(n). The difference between true result and approximation increases as n grows, but the ratio between the two values tends to 1.

    To state the prime number theorem in its full mathematical glory using mathematical notation, let’s first write $\pi (n)$ for the number of primes that are smaller than, or equal to, $n$. The theorem says that

      \begin{equation} \lim _{n \to \infty } \frac{(n/\ln {(n)})}{\pi (n)} = 1.\end{equation}   (1)

    If you know a bit of calculus you will know that you can swap the numerator and denominator here, so expression (1) is equivalent to

      \begin{equation} \lim _{n \to \infty } \frac{\pi (n)}{(n/\ln {(n)})}= 1.\end{equation}   (2)

    The prime number theorem is usually stated using the second expression (2). It is also sometimes written as

      \[ \frac{\pi (n)}{n/\ln {(n)}} \sim 1, \]    

    which in spoken language is "$\pi (n)$ is asymptotic to $n/\log {(n)}$ as $n$ tends to infinity."

    This article was produced as part of our collaboration with the Isaac Newton Institute for Mathematical Sciences (INI) – you can find all the content from the collaboration here.

    The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit to find out more.

    INI logo

  2. Marianne Freiberger

    "This year we will have students going into the third year of their degree who have not had a single complete year of normal face-to-face education," says epidemiologist Mike Tildesley from the University of Warwick. It's a stark illustration of just how much the pandemic has disrupted student life, depriving students of what should have been one of the most formative, and fun, periods of their lives.

    See here for all our coverage of the COVID-19 pandemic.

    This year most universities will no doubt do their best to give their students the experience they deserve. But even though there no longer is a government mandate on how to manage COVID-19 in higher education, many are still likely to employ some kind of measures to protect their students and the surrounding communities from COVID-19.

    Better the virus you know

    Lecture hall

    In the last academic year many lecture theatres remained empty.

    In contrast to 2020, when there was very little information to go on, universities will now have a body of research to guide their decisions. Tildesley, together with thirteen other epidemiologists and mathematicians, has recently published a comprehensive study, which analyses data from last year's autumn term to see what actually happened at universities in terms of COVID-19, and uses mathematical modelling to see whether interventions could help reduce outbreaks.

    Although this year is different, in both good ways (vaccinations) and bad ways (the Delta variant), the lessons learnt should on the whole still hold. "Things like vaccination will probably change some of the quantitative aspects of our results, but not necessarily the qualitative aspects," says Tildesley.

    The study is a result of a concerted effort by various research groups that started back in the summer of 2020, with a virtual workshop hosted by the Virtual Forum for Knowledge Exchange in the Mathematical Sciences (V-KEMS) and the Newton Gateway to Mathematics. Two follow-up events were run as part of the Infectious dynamics of pandemics programme at the Isaac Newton Institute in Cambridge, and a working group continued to meet regularly after that. Seven of the study's authors, including Tildesley, are members of the JUNIPER modelling consortium.

    The study suggests answers to a number of questions both students and staff of universities will find interesting. You can jump straight to the question you are interested in by clicking on the links below, or just keep reading to see the answers to them all.

    How did (and will) university outbreaks originate?

    When the autumn term started last year, the number of COVID-19 cases was on the rise in the UK. Following government advice, most universities offered a mixture of face-to-face and online teaching. Many also offered a testing regime and segmented halls of residence into smaller households.

    Despite these restrictions, some universities still saw large outbreaks. You probably remember the newspaper headlines, featuring desperate students stuck in self-isolation and complaining about inadequate food supplies. Luckily these weren't the norm. The recent analysis shows significant variation between institutions, raising the question of how those large outbreaks had originated.

    The mass migration at the start of term had always been a prime suspect in this context, so the team behind the recent study estimated the proportion of students likely to have arrived at each university already infected, based on COVID-19 data from the regions students came from. A simple probabilistic model then suggested that this proportion is linked to the probability of the university seeing an outbreak (which was defined as having more than a certain threshold number of cases). In other words, the pattern of observed outbreaks across universities was consistent with the idea that these outbreaks had been seeded by students arriving at the start of term already infected. (If you'd like to look at some of the maths involved in this model, see below.)

    While the model is simple and comes with caveats the result suggests a lesson worth keeping in mind for the upcoming autumn term. "The likelihood of seeing campus outbreaks is going to be dependent on the prevalence in the communities students are coming from," says Tildesley. "The size of any outbreak will depend on the transmissibility of the virus — we now have a more transmissible variant than twelve months ago. But we now also have a proportion of students being vaccinated. Those factors will trade off against each other."

    Halls of residence

    Once COVID-19 has entered a university, halls of residence provide an obvious breeding ground. To keep things under control when students arrived last autumn many universities divided their halls into smaller households: students were not supposed to socialise with anyone outside their own household and the whole household would go into isolation once a member showed symptoms or tested positive.

    Apple, books and blackboard

    What is the academic year 2021/2022 going to look like?

    At the time these measures were based on intuition: there wasn't much evidence on how the risk of transmission within halls compares to the risk in the community, and whether segmenting halls would make any difference.

    To gain some clarity, the team behind the recent study used as an example the data from one particular university, which manages 19 halls in total and had divided each into households of up to 16 members. They used a statistical technique called multi-variate regression (a version of linear regression which works when there are more than one variable) to figure out to what extent various factors would impact on the proportion of people in the halls that are infected by an incoming infection. (This proportion is called the secondary attack rate.) The factors the researchers considered here included the overall size of the hall, the median size of households in the hall, the proportion of students sharing bathrooms, and the proportion of students on medical courses (as a proxy for students at higher risk due to their course).

    The result is perhaps surprising: only the overall size of the hall and the proportion of students sharing bathrooms were associated with the secondary attack rate in the final multi-variate analysis. Although, as ever, there are limitations to the analysis, this suggests that splitting halls into households is of limited value when it comes to reducing the risk of transmission. A lot more effective, the analysis suggests, would be to only partially fill the hall and coordinate things so that the use of shared spaces is limited.

    Spilling over

    Last year's outbreaks on university campuses condemned many students to self-isolation in small rooms far away from home, with awful consequences on some people's mental health. But with most students being young, fit and healthy, the risk of them becoming seriously ill or dying was low.

    The same couldn't be said for the surrounding community, however. One of the worries about university outbreaks was that they would spill over, posing a high risk to vulnerable people living nearby. "One of the things we were particularly interested in was trying to identify any potential links between universities and communities," says Tildesley. "So that's looking for signals that might indicate whether a rise in cases in surrounding communities was in any way linked to a rise in cases on university campuses."

    CMS, Cambridge

    The Centre for Mathematical Sciences at the University of Cambridge (home of Plus). In Cambridge there are many interactions between students and the community.

    Since students mix and interact with the local community, such links are hard to tease out from the data, but the team decided to use two approaches: one looked at geographical data on confirmed COVID-19 cases in areas with a high density of students and those nearby, while the other took account of people's ages. Those who were 18 to 24 years old served as proxy for students and other age groups represented the rest of the community.

    Careful analysis of these data sets, and information on outbreaks provided by universities, revealed a complex picture. "The conclusion we found here is that we did see evidence of spill-over in some, but not all cases," says Tildesley. "So there is no direct link there that applied in all settings."

    The reason for this is probably that universities are unique in the way they sit within their local communities (for example if they are campus universities or not) and that different universities applied different COVID measures. In one case, the researchers suggest that a large outbreak did not spill over because the university in question routinely tested students whether they had symptoms or not. This meant that even those who did not show symptoms could be made to self-isolate, rather than unwittingly spreading their infection further.

    Staggering returns

    As autumn 2020 progressed into winter Christmas started to loom large. The prospect of students fanning out across the country for their holidays, only to return in January potentially infected, led the UK government to advise universities to stagger their students' return over a five week period. They were also to be offered tests on arrival.

    The national lockdown imposed on January 4, 2021, prevented the staggering from being tried out, as most courses were then moved online. Instead of analysing data, the researchers therefore used mathematical models to predict the effect staggering returns might have on outbreaks at universities.

    "What we found is that ultimately, over the course of the term, there was no particular impact of staggering on the overall number of cases you would see on campus," says Tildesley. The work also showed that, while staggering can reduce the time students spend in self-isolation, and the peak number of students self-isolating on any given day, this only happens when the proportion of students testing positive is small. When it is high, then staggering can actually make things worse, as student households end up going into isolation several times.

    The researchers used four different mathematical models to arrive at their results. Two of them (an SIR model and an individual-based model) were kept nice and simple, while the other two (a compartmental model and a model based on a network) included a little more complexity. These models simulated the growth or decline of infections in hypothetical populations of students under various assumptions on the exact nature of the staggering and testing policy, mixing patterns between students and the social groups they belong to, the chance a student tests positive, and so on. You can find the details on the assumptions used for each model in the paper.

    Interestingly, one model also considered students' behaviour — to be precise the extent to which they isolate when they are told to, get themselves tested and have their contacts traced. "The model suggests that adherence to test, trace and isolate is a much stronger driver in reducing transmission than staggering," says Tildesley. You can stagger returns as much as you like, if your students don't stick to test, trace and isolate rules, the effect will be washed out.

    Testing, testing, testing

    lateral flow test

    The NHS test kit (Image: tommoh29)

    The January 2021 lockdown was imposed because of the emergence of the Alpha variant. This prompted the researchers to model how different testing regimes would fare in the face of more transmissible variants. They were particularly interested in regimes that routinely test students whether they have symptoms or not, and can therefore catch asymptomatic and pre-symptomatic cases. The model they used involved a layered network, with one layer representing students' household contacts and the other representing all their other contacts. The model was then used to gauge the impact on the number of positive cases under five different testing strategies: testing every 3, 7, 10 or 14 days, or no testing at all.

    The results show what you might expect: that more frequent testing better controls the number of infections. What's somewhat shocking, though, is that once you're dealing with a more transmissible variant such as the Alpha variant, testing needs to be at its most frequent to prevent a major outbreak — and that means testing every student every three days. The currently dominant Delta variant is of course even more transmissible than Alpha.

    So what will happen this year?

    The biggest difference between this and last year is that now we have the vaccines. But since not all students will be fully vaccinated, and the vaccines aren't 100% effective, outbreaks on campuses are still possible. The results of the recent study still give important guidance on the qualitative nature of these outbreaks, their links to the community, and what measures might work to fight them.

    "One of the key things we are trying to establish at the moment is the proportion of students that will be vaccinated, as that will obviously have an impact on what we might expect to see upon students returning come the end of September," says Tildesley." Of course the vaccination status of international students is also important, and could potentially drive what will happen." As mentioned above, the effect of a proportion of students being vaccinated will play off against the more transmissible Delta variant, and the amount of COVID in students' home regions will impact the number of students arriving at university infected. "What happens in September in the community is going to give us an indication of what we expect to see when universities reopen," says Tildesley.


    The effect of vaccines will be payed off against the greater transmissibility of the Delta variant.

    In general, Tildesley thinks that each university should take an approach tailored to its own unique conditions. "Universities are very diverse in the way they operate, so I think they need to take a risk-based approach," he says. "They need to base their decisions on the risk to their students and the potential benefits to their students."

    "Clearly we want to prevent large outbreaks on student campuses, but we need to think about students' mental health as well. I know how important it is for students to experience student life and they have just not done that in the last two years. This is why we need to look at the risks associated with returning to face-to-face teaching and relaxation of the rules, and weigh them up against the benefits for students of these things for their own well-being and learning. There has to be some kind of balanced approach moving forward to try and get back to normality."

    A bit of maths...

    Here’s a way of figuring out whether the pattern of outbreaks at universities is consistent with the idea that they were seeded by students arriving infected at the beginning of term. First stipulate the number of cases that constitute an outbreak. In their paper the researchers used 200 cases, and then repeated their analysis for 400 cases. Next, estimate the number $n$ of incoming infected students for each university, and allocate universities into groups according to the value of $n$: the first group contains universities with up to 10 incoming infections, the second with between 10 and 20 incoming infections, and so on.

    From the data for each outbreak it is possible to estimate the probability $p$ that an incoming infection fails to produce an outbreak. If all the outbreaks were seeded by incoming infections, then the probability $P$ that a university experiences an outbreak would be

      \[ P=1-p^ n. \]    

    (Here it’s assumed that the probabilities of infections failing to produce an outbreak are independent of each other.)

    Now for each of your groups (which were produced according to the number of incoming infections) calculate the fraction of universities in that group that experienced an outbreak. This then is an estimate of $P$ for universities in that group. If these estimates for $P$ look roughly like the theoretical probability calculated above, then this suggests that the pattern of outbreaks is consistent with the idea that they were seeded by incoming infections.

    See the paper for more details.


    These plots are taken from the paper. They show how the theoretical probability calculated above (red curve) compares to the proportion of universities that had an outbreak (blue stars). In the left plot an outbreak was classified as more than 200 cases and in the right plot as more than 400 cases. Reproduced under CC BY 4.0.

    About this article

    Mike Tildesley

    Mike Tildesley

    Mike Tildesley is Professor in the Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research at the University of Warwick, a member of JUNIPER, and a member of the Scientific Pandemic Influenza Modelling group (SPI-M) that advises the government on the scientific aspects of the pandemic. You can listen to a podcast featuring Tildesley here.

    Tildesley was interviewed by Marianne Freiberger, Editor of Plus, in August 2021. She would like to thank Louise Dyson and Ed Hill, co-authors of the study and members of JUNIPER, for their help with this article.

    This article was produced as part of our collaborations with JUNIPER, the Joint UNIversity Pandemic and Epidemic Response modelling consortium, and the Isaac Newton Institute for Mathematical Sciences (INI).

    JUNIPER comprises academics from the universities of Cambridge, Warwick, Bristol, Exeter, Oxford, Manchester, and Lancaster, who are using a range of mathematical and statistical techniques to address pressing question about the control of COVID-19. You can see more content produced with JUNIPER here.

    The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit to find out more.

    Juniper logo

    INI logo

  3. This week a team from the University of Graubünden in Switzerland announced that they have calculated the decimal digits of the number $\pi $ to a record 62.8 trillion places. The calculation took 108 days and 9 hours, making it over three times faster than the calculation that gave us the last record, which had calculated the digits of $\pi $ to a mere 50 trillion places.


    Here are the first few digits of the number π.

    The number $\pi $ is what you get when you divide a circle's circumference by its diameter and is the same for any circle, no matter how big or small. The number is irrational, which means it can't be written as a fraction. It also means that its decimal expansion is infinitely long and doesn't end in repeating blocks. It would be impossible to write down the entire decimal expansion of $\pi $ (you'd need an infinite amount of time and an infinite piece of paper). All we can ever hope to do is calculate a finite piece of it, giving us an approximation of $\pi $ — the more decimal places, the more accurate the approximation.

    Because $\pi $ lies at the heart of any technology involving rotation or waves, it has many uses outside of maths. You need it to construct rotating parts in jet engines, for example, or the rings used in medical imaging devices like CAT or MRI scanners, and for frequency calculations for mobile phone and GPS devices (find out more in this article).

    Many of these uses of the number $\pi $ require you to know its value to a high degree of accuracy, but that's not the reason why the Swiss team, and most other $\pi $ record hunters out there, go to the effort they do. The latest record was achieved by a team from the Centre for Data Analytics, Visualization and Simulation (DAViS) at Graubünden, who wanted to test and show off their computing expertise. "Our attempt at the record had several aims," said Heiko Rölke, Head of DAViS. "In the course of the preparation and execution of the calculations we built up a lot of know-how and optimised our processes. This is particularly useful for our research groups with whom we are working on computing intensive projects in data analysis and simulation."

    Computers can perform the calculations a lot faster than any human could ever hope to, but at the heart of those calculations lie mathematical formulas you could, in theory at least, work out with pencil and paper. There are various mathematical algorithms that allow you to calculate the value of $\pi $ to any degree of accuracy if only you execute sufficiently many steps of the algorithm.

    A relatively simple one comes from the infinitely long sum

      \begin{equation}  1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} + \ldots \[  \]\end{equation}   (1)

    The more terms in this sum you add (or subtract), the closer the result to the value of $\pi /4$. In fact, by adding sufficiently many terms you can get as close as you like to the value of $\pi /4$. You then only need to multiply your result by $4$ to get an approximation of $\pi $.

    The problem with using our expression (1) is that you need to include a large number of terms to get a decent degree of accuracy. If you include 100 terms and multiply the result by $4$, you get $3.146567747182956$. This is fairly close to $\pi =3.141592653589793...$ but it’s not a particularly accurate estimate given the effort involved in adding up 100 terms.

    The DAViS team used an algorithm based on a more complex looking sum, which was published in 1988 by the Chudnovsky brothers. The sum is written as

      \[ \frac{1}{\pi } = 12 \sum _{q=0}^{\infty } \frac{(-1)^ q(6q)!(545140134q+13591409)}{(3q)!(q!)^3(640320)^{3q+3/2}}. \]    

    The symbol $\sum $ tells us that we are dealing with a sum in which each term has the form that comes after the $\sum $ symbol. The $q=0$ at the bottom of the $\sum $ means that for the first term in the sum you substitute $q=0$ in the expression after $\sum $, for the second term in the sum you substitute $q=1$ in the expression after $\sum $, for the third term in the sum you substitute $q=2$ in the expression after $\sum $, and so on. The infinity sign on top of the $\sum $ symbol means that you keep substituting values of $q$ forever, all the way up to infinity.

    For any $q$, the expression $q!$ stands for

      \[ q! = q \times (q-1) \times (q-2) \times (q-3) \times... \times 2 \times 1. \]    

    The more terms of this sum you add up, the closer the result to $1/\pi .$ You now only need to divide $1$ by your result to get an approximation for $\pi $.

    The Chudnovsky algorithm is the most efficient method that is know for calculating the decimal places of $\pi $.

    For more on how to approximate $\pi $ using infinite sums, see our article How to add up quickly by Chris Budd.

    For the record, the last ten digits in the known part of the decimal expansion of $\pi $ are $7817924264$. Congratulations to the DAViS team!

  4. Many aspects of our lives today are possible thanks to machine learning – where a machine is trained to do a specific, yet complex, job. We no longer think twice about speaking to our digital devices, clicking on recommended products from online stores, or using language translation apps and websites.

    Since the 1980s neural networks have been used as the mathematical model for machine learning. Inspired by the structure of our brains, each "neuron" is a simple mathematical calculation, taking numbers as input and producing a single number as an output. Originally the neural networks consisted of just one or two layers of neurons due to the computational complexity of the training process. But since the early 2000s deep neural networks consisting of many layers have been possible, and are now used for tasks that vary from pre-screening job applications to revolutionary approaches in health care.

    Deep learning is increasingly important in many areas both outside and inside science. Its usefulness has been proven, but there still are a lot of unanswered questions about the theory of why such deep learning approaches work. And that is why the Isaac Newton Institute (INI) in Cambridge is running a research programme called Mathematics of deep learning (MDL), which aims to understand the mathematical foundations of deep learning.

    To introduce you to the ideas involved, and to understand more about what the MDL programme is all about, we are putting together a collection of articles and podcasts over the next six months. Start by reading our introductions to the area, and stay tuned for more!

    Maths in a minute: Artificial neurons — When trying to build an artificial intelligence, it makes sense to mimic the human brain. Artificial neurons do just that.

    Maths in a minute: Machine learning and neural networks — Machine learning makes many daily activities possible, but how does it work? Find out in this brief introduction.

    Maths in a minute: Gradient descent algorithms — Whether you're lost on a mountainside, or training a deep neural network, you can rely on the gradient descent algorithm to show you the way!

    What is machine learning? — Find out how a little bit of maths can enable a machine to learn from experience in this more in-depth introduction from the Plus library.

    The agent perspective — In this collection from the Plus library, deep learning pioneer Yoshua Bengio explains why he thinks that true artificial intelligence will only be possible once machines have something babies are born with: the ability to interact with the world, observe what happens, and adapt to the consequences of their actions.

    Seeing traffic through new eyes — Traffic is not just annoying, it can also come at a high cost to human and environmental health. This article from the Plus library explores how a form of deep learning is being used to help understand traffic, so that suitable city planning can tame it.

    Artificial intelligence takes on COVID-19 — In this article from the Plus library, we find out how researchers from the AIX-COVNET project are developing a tool that will utilise deep learning to help diagnose COVID-19.

    If you'd like more mathematical detail then you can watch talks from the GRA programme on the INI website. To see more Plus articles about machine learning click here.

    We produced this collection of content as part of our collaboration with the Isaac Newton Institute for Mathematical Sciences (INI), an international research centre and our neighbour here on the University of Cambridge's maths campus. INI attracts leading mathematical scientists from all over the world, and is open to all. Visit to find out more.

    INI logo

  5. You are out on a mountain and the mist has descended. You can't see the path, let alone where it leads, so how on Earth do you find your way down to the safety of the village at the bottom of the mountain?

    Never fear! Maths has come to your rescue! You can use a gradient descent algorithm – a mathematical technique for finding the minimum of smooth (i.e. differentiable) functions!

    The mist is so thick that you can't see into the distance, so instead, you have to work locally. You can explore the ground very close to where you are standing and figure out which direction is downhill and head that way. That's how a gradient descent algorithm works, though instead of feeling which way is downhill with your feet and body, you can find the downhill gradient mathematically using calculus. (Exactly what the algorithm looks like depends on the problem, you can read some of the details on Wikipedia, which is also where we first came across the lovely misty mountain analogy.)

    Reflections in Loch Katrine

    Maths can find the way!

    If we are using this algorithm on a mountainside we are working our way over a two-dimensional surface. You can easily imagine this algorithm going wrong and you ending up in some local minimum where every direction is uphill and you are stuck in hole, a pit of despair halfway down the mountain and not the safety of the village at the bottom. In some mathematics problems you might be happy with a local minimum, and the gradient descent algorithm is a reliable way to find one of these. But if what you want is the global minimum, you might have to move to higher-dimensional ground.

    A saddle point

    A saddle point on a surface, the red line tracing the progress of a gradient descent algorithm

    On a two-dimensional mountainside you have a limited number of directions to choose from: North, South, East and West and combinations of these. But in higher dimensional spaces there are a lot more directions to try. And thankfully it turns out that you can show mathematically that you'd have to be really unlucky to end up in a place that is a local minimum in every possible direction: even if in many of the directions it looks like you are in a hole (in a local minimum), you can be almost certain that at least one of the many possible directions will lead you out downhill (which means you're at what's called a saddle point, rather than in a hole), and you can carry on moving downhill towards the safety at the bottom. The gradient descent algorithm has a really good chance of leading you to the global minimum.

    We came across the gradient descent algorithm in the context of machine learning, where a neural network is trained to do a certain task by minimising the value of some optimisation function as you move over a mathematical surface. And in this case the mathematical surface equates to the space of possible configurations of the weights given to the connections between neurons in your neural network. (You can read a brief introduction to machine learning and neural networks here.) The gradient descent algorithm is often used in machine learning to find the optimal configuration for the neural network.

    The reliability of the gradient descent algorithm in higher dimensions is the solution to something called the curse of dimensionality. We spoke to Yoshua Bengio, one of the pioneers of something called deep learning, where neural networks are incredibly complex and have lots of layers, leading to very high dimensional vector spaces to explore to find the best mathematical function to perform a machine learning task.

    It had been thought that it would be impossible to optimise things for these bigger neural networks. The fear was that the machine learning algorithms, such as the gradient descent algorithm, would always get stuck in some sub-optimal solution – one of those local minima or pits of despair on the side of the mountain – rather than reach the best possible solution. But in fact Bengio and colleagues were able to mathematically prove that you are more likely to find yourself at a saddle point, rather than a local minimum in these higher dimensions, and in fact machine learning performance gets better with bigger neural networks.

    So whether you're lost on a mountainside, or training a neural network, you can rely on the gradient descent algorithm to show you the way.

    Read more about machine learning and its many applications on Plus.

    This article was produced as part of our collaboration with the Isaac Newton Institute for Mathematical Sciences (INI) – you can find all the content from the collaboration here.

    The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit to find out more.

    INI logo