Let me start off by saying that I normally do not post on forums in search of help. I'd rather waste hours of my own time than that of someone else's. However, I've exhausted all of my resources with this recent issue and my Google-fu has let me down. I will say up front that I have learned a major lesson from this in terms of keeping a backup of the root system. I also learned while reading around that snapper works with ext4 (oh how I wish I would have made a snapshot of my system). I really hope that I don't have to reformat since I will lose many configurations in the main system and it will probably take me at least a week to reinstall software and reconfigure anything not saved in /home (but that will be the price I pay for my incompetence and oversight). So with that said, let me begin by describing the problem and providing as much information as I can.
The problem started this past Saturday, 9/29, after a `zypper dup`. Not encountering any blatantly obvious issues, I rebooted as I usually do after updating. The update upgraded my kernel to 4.18.8 and all looked like things were fine, until the boot just sorta stopped... as in it didn't reach either the KDE login screen or get thrown to emergency mode. I can't access any of the tty environments to login either. Instead I just have the boot output showing [ OK ] Started Update system wide CA certificates as the last successful output. Reviewing the prior output, I can see that the root filesystem was mounted successfully and then my /home partition on a separate drive was mounted after it was unencrypted. I originally noticed warnings for the wicked service failing to start and my NFS shares failed to mount (for obvious reasons), however I believe I've fixed this since. Using Alt + F1, I can see the following:
- a few mce [Hardware Error] entries (I've had these from before and I believe the CPU is correcting them before boot)
- an entry for [FAILED] Failed to start Setup Virtual Console (I think this is related to /etc/vconsole.conf)
- an entry for Starting Show Plymouth Boot Screen...
So not having any clue where to start, I started with the basics. First, I tried to boot into the recovery/fallback kernel by going into the Advanced options in grub. I also tried the other kernel options that were available to me and their respective recovery/fallback option (4.18.5 and 4.17.9). I even tried editing the grub for each boot and removing extra kernel options and enabling debug. I had absolutely no luck with any of these things, so my next thought was to make myself a new live USB and try an upgrade. After installing some new files, removing some obsolete, and rebooting, I returned to the same scenario. At this point I let the system sit for a while, I wondered if it was just taking a really long time for some unknown reason. Needless to say, that didn't help.
Next thing on the list was repair the bootloader and check the filesystem. I booted off the live USB again, ran fsck on the system drive, then mounted the system, chrooted in and rebuilt grub with `grub2-mkconfig -o /boot/grub2/grub.cfg` and `grub2-install /dev/sda`. File system came back fine, and the grub rebuild didn't improve my situation. So at this point I decided to boot into the main system and pass the rescue kernel option to force my way into rescue mode natively. I started poking around in the systemd journals for any error I could find and started researching each and every possible one.
Here's the output of `journalctl -xrb -1` for anyone who wants to take the time to read through it.
https://pastebin.com/jd62ayEG
Some things I noticed looking through systemd logs:
- NVRM entries related to the Nvidia probe routine (I've tried removing and reinstalling the Nvidia proprietary drivers as well as blacklisting nouveau and using the nomodeset kernel option)
- /usr/bin/loadkeys failed with exit status 1 (I believe this is related to the console keyboard layout)
- Invalid rule /etc/udev/rules.d/50-brother-brscan4-libsane-type1.rules:9: unknown key 'SYSFS{idVendor}' (Related to my Brother printer, I think this is safe to ignore)
- EDAC sbridge: Couldn't find mci handler ... ECC is disabled (I've had this since I started using linux, appears to be related to my chipset)
- Process 236 (haveged) of user 0 dumped core (this seems to be related to system entropy and I assume the core dumps since the system is hanging at the end and not proceeding to a login)
And here is some other various information that may or may not be useful.
The output of `mkinitrd`:
https://pastebin.com/cSqzFzFH
Zypper history from 9/29:
https://pastebin.com/wKSrVk1K
System:
- Tumbleweed x86_64
- i7 5820K
- Geforce GTX 1050 2GB
- 16GB DDR4
Please let me know if there is anything else I can provide and thanks ahead of time for any assistance. My fate is in the communities' hands.