By Marianne Ryde for the magazine Dynamo, picture: De Wood, Pooley, Wikimedia Commons
Infection with the bacterium campylobacter most often occurs through our food and can cause stomach infections with symptoms such as nausea, diarrhea, and fever. In recent years, the number of campylobacter infections has been on the increase. In 2023, more than 5,000 cases were registered in Denmark.
An important tool for the food authorities in connection with the disease-causing bacteria is the so-called source account, which DTU National Food Institute prepares. It estimates the proportion of illness cases that come from different animals and foods, and it can give the authorities an indication of where to take preventive action - and subsequently how effective these efforts have been.
More data to keep track of
The source account is based on the food authorities' samples from animals and food and data from Statens Serum Institut's samples from people infected with the disease-causing bacteria. The principle is to divide the identified bacteria into different genetic subtypes and create a model based on the patterns that emerge.
Such a source account has been prepared for many years for salmonella, where you can 'get away with' keeping an eye on relatively few subtypes. Campylobacter is a more complex organism, and to track it accurately, it was necessary to sequence, i.e. map, the entire bacterium's core genome of approximately 1,300 genes.
"Of course, when we go from less than 20 to 1,300 genes, the amount of data becomes much larger and more difficult to keep track of. That's why we came up with the idea of using machine learning a few years ago. At the time, few others had tried," says Professor Tine Hald. She leads a group of researchers at DTU National Food Institute, who – among other things – are responsible for preparing the source accounts.
Master's student gets started
As a master’s student at DTU, Maja Lykke Brinch started developing a machine learning solution. She 'fed' gene sequences for campylobacter from different animals and foods into a supercomputer.
"You typically take 70 per cent of your dataset – in this case campylobacter gene sequences from multiple sources collected in 2015-17 – and train the algorithm on it. Then you give it the last 30 per cent, where you know the source, and see if the machine can get it right. Once you have a sufficiently accurate model, you provide it with data from people with bacterial infections where neither we nor the model know where the disease originates. The model then predicts the probability that a case of infection originates from a specific food source," explains Maja Lykke Brinch, who has been responsible for a significant part of the work.
She is now a PhD student at DTU and first author of an article that compares different calculation methods and concludes that the machine learning algorithm is the most useful method for campylobacter. It finds the correct sources in 98 per cent of cases.
Campylobacter now and in the future
Chicken, duck, turkey, cow, pig, deer, dog, and cat - the bacterium is everywhere around us, and people can become infected by interacting with the animals or coming into contact with soil from the places they have been. Infection can also occur through water from private wells or through sewage overflow due to heavy rain. One of the main routes of infection for salmonella is from parent animals to offspring, for example, via the egg. Campylobacter does not spread in that way and is not found in bird eggs. It comes from outside, including from wild birds, and varies with the seasons, so virtually all outdoor chicken flocks are infected in the summer; this also applies to organic chickens. Campylobacter is also brought into the barns by humans, and it is very difficult to eradicate it. However, high heat kills the bacteria, and with good hygiene and heat treatment of food, one can avoid illness.