Introduction
Troubles, errors, bugs and isn't working servers that tiny list of potential problem for admins, specialist for information security and etc. All of us need to solve a appeared issue as soon as possible! But how to start, where is location of reason and how to deal with that? We will consider main area of potential problems: filesystem, services, units and etc. Because of massive amount of appears issue, they have own way to solve them. We can divide process of troubleshooting for two steps: analyze and apply solution!
Firts step: Analyze
First of all, we need to identificate problem and area in the OS. We can see error message from the program, utility or service. For example system of Access control and account manage may print on the screen that error:
We dont have permission of created file by root because of they have attribute 700. That mean root have rwx - right, but other user and group doesn't have! For resolve that type of issue change attribute of needed file or catalogue. Also system may can't notify you of happening inside, because of that doesn't interrupt process. OS like black box with million process inside and we need to monitor they all, for that we have log files, that can help to deal with that issue.
Warning! But every system, application, service may have own log-utility or file for troubleshooting, you need to remind this!
Let's have a look at the system log, so how exactly all system problem documentid here:
journalctl | grep "login" | tail
That command have several pipeline, let's give some explanation. First command allowed to check log's of the system, second help find text pattern in the amount of data and tail show end of logs. In the result we have end of login logs from the OS:
It's kinda funny to highlight, but on the server for the experiment goals, we doesn't set public key and you can see reason of your potential problems: various of IP - addresses are trying to get into the machine. So set trusted public key. That help to figure out your source of problem. Also we can use that for check status one of the service if we see problem below, just type:
journalctl -xeu postgresql@15-main.service
By the recommendations above we can check journal of action units:
At the screen above we can see filed Subject with problem Unit failed, check raw with tag FATAL. In that case problem in the configuration files, therefore unit can not be started!
Second step: Solution
If you want to check services for their status, then you need to type:
systemctl list-units -t service -p important
Let's troubleshoot postgresql service, we go the config file and find error raw:
Delete them by the button combination Ctrl + K then save them and restart the service:
systemctl restart postgresql@15-main.service
And at the same time, we need to check status of the service:
systemctl status postgresql@15-main.service
For your case you may need more exactly step for solving them, let's search them by the same message error in the engine search or AI generative services, like GPT. In that article we consider monitor tools for solve your errors.
Also if you can not find reason of problem, but your machine work slowly, you can install Task Manager - atop:
apt install atop -y
Then just type atop and wait pop-up window with main running process:
atop
In the last column we can see CPU field with percentage of using process at the current time and if we see highload, then kill process or suspend them:
kill -SIGINT 332697
And second command for kill them:
kill -SIGTERM 332697
If CLI don't return message with issue, then it execute successful!
Conclusion
Encountering errors and challenges within a Debian-based system is a common part of system administration. This guide has delved into various aspects of identifying and resolving these issues effectively.