09.10.2023

Analysis of Debian kernel errors. Troubleshooting

Introduction

Kernel errors represent a formidable challenge in the realm of computer systems, eliciting both frustration and concern among users of all levels of expertise. Situated at the core of any operating system, the kernel assumes a pivotal role in orchestrating hardware resources and safeguarding system stability. Yet, when these intricate kernels encounter errors, the consequences can be dire, ranging from system crashes to data loss and pervasive system instability.

Within this comprehensive guide, we shall embark on an exploration into the intricate world of troubleshooting kernel errors. Our journey will entail an examination of the common origins of these errors, the far—reaching repercussions they impart upon computer systems, and, most crucially, the arsenal of techniques and methodologies at our disposal for the proficient diagnosis and resolution of these enigmatic issues. Whether one's role involves system administration, software development, or merely a passionate curiosity for the inner workings of computing systems, mastering the art of addressing kernel errors is a skill of immeasurable value, offering the means to ensure the vitality and dependability of one's digital domains.

First stage: Errors

Problems in the OS can appear unexpected: slow I/O of disk, disappeared device, highload of CPU. All of that critical impact for the modern server — unacceptable. For that we have various system utilities to monitor and log happening events. Syslog — it's a system journal that can help in the investigation of incident software, daemon, filesystem, network and other issues. However, they don't cover messages, issues from devices and drivers. That space in the monitor system is filled by the kernel by their log function.They use a special channel — ring buffer for transferring all kernel messages and information about connected devices, their drivers, etc.

For reading messages from the kernel we may use dmesg — utility, which has various options for work with journal:

dmesg

Screenshot №1 — Message from kernel

The screen will display a bunch of information that the kernel sends to the log file. With that amount of data we can’t work. For that we have options, which specify printed info:

dmesg -H

Screenshot №2 — Use option H

The data is adjusted according to the pattern and sorting by the realm. But we yet don’t have realtime information from the kernel log module, let’s fix that, by typing command below:

dmesg -H -w

Screenshot №3 — Use option H and w

To watch only kernel and service messages specify option to —k, by default, the log service displays only error and warning messages. The —k option allows you to output all messages, including informational messages:

dmesg -k -H

Screenshot №4 — Use option H and k

For cleaning buffer from kernel message, just type:

dmesg -c

For limit buffer’s message use —s:

dmesg -s 4096

Important to know! If you increase the size of the buffer, therefore by logic that requires more space on the disk. But that's helpful if you have trouble with performance on your machine, messages will appear more frequently!

Also we can use pipeline to filter journal, for example:

dmesg -H | grep “error”

By that way we can monitor the status of devices, drivers and kernels. For the next stage let’s consider the main problem with kernel which can appear suddenly.

Solve the problem

In the Linux there are several type of critical error for kernel, that we should to know:

If you encounter the error message Segmentation fault unexpectedly, it could signal an issue with the driver. In such cases, you have the option to either update the driver or uninstall it and replace it with a different one.

For that case, first of all, we should check loaded modules to the kernel, by the command:

lsmod

If you see your driver or module in that list, that has started, but doesn't work properly. Reinstall your drivers, by the default way through the apt—get, aptitude utilities or use source code for compile.

Similarly, should you come across the error message Divide by zero, it might suggest a software bug. In this scenario, you can consider updating the software or adjusting the system configuration to address the issue.

In that case we should search for problems in the using module, because of the probability incorrect work kernel will go to minimal value. If we use software with a pre—installed module, then reinstall them and remind us to use the purge option!

apt purge software_name

And then use autoremove to clean disk space from unused libraries:

apt autoremove

Furthermore, if you happen to see the error message Kernel panic, it may indicate a critical problem with the kernel. In response, you can attempt to resolve the issue by either rebooting the system or restoring it from a previously saved backup. If you clone your system, then you need to startup and go into BIOS, for change boot to backup IMG. In the Serverspace we need go to the main page, choose needed server:

Screenshot №5 — Main page

And switch tab for Recovery, then change boot mode for Boot from recovery image:

Screenshot №6 — Recovery

Therefore, next boot will upload additional image, but what to do if that way don't help? The core went into a state of panic and with default message we can't solve the problem, then you need to install from rescue mod kdump. That kernel module will help gather data, variable and give more thoroughly information about the problem.

Conclusion

The navigating the intricacies of kernel errors within computer systems is a challenging endeavor, one that demands our attention and expertise. These errors, often unforeseen and disruptive, can manifest in various ways, affecting system performance and stability. To address these issues effectively, we have explored the tools at our disposal, such as syslog and dmesg, to monitor and analyze kernel messages, providing valuable insights into system health.