By Adrian | May 12, 2020
I was so excited at the thought of all the cool new features that have popped up in TheHive v4.0.0-RC2 that I went straight onto my lab to give it a spin. Little did I know that my system was broken before I even started and I spent the best part of a few hours trying to figure out what exactly happened. For a brief moment I did consider burning the lab down and just rebuilding it, but I asked myself what would happen if this were a prod system? and with that thought I persisted to root cause.
TLDR: Java.
So what happened
When I went to upgrade my instance the first thing I did was to check the status of TheHive service, before I was going to shut it down pre upgrade.
$ sudo service thehive status
# Output
● thehive.service - Scalable, Open Source and Free Security Incident Response Solutions
Loaded: loaded (/usr/lib/systemd/system/thehive.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-05-12 07:34:36 UTC; 51s ago
Docs: https://thehive-project.org
Process: 4364 ExecStart=/opt/thehive/bin/thehive -Dconfig.file=/etc/thehive/application.conf -Dlogger.file=/etc/thehive/logback.xml -Dpidfile.path=/dev/null (code=exited, status=255
Main PID: 4364 (code=exited, status=255)
May 12 07:34:27 thehive4 systemd[1]: Started Scalable, Open Source and Free Security Incident Response Solutions.
May 12 07:34:36 thehive4 systemd[1]: thehive.service: Main process exited, code=exited, status=255/n/a
May 12 07:34:36 thehive4 systemd[1]: thehive.service: Failed with result 'exit-code'.
lines 1-10/10 (END)
Seems the service had crashed and couldn’t start successfully (code=exited, status=255). Naturally I went straight for /var/log/thehive/application.conf for clues. The errors that java pumps out is enough to make you puke, but i’ve pulled out the relevant lines:
2020-05-12 10:19:29,243 [ERROR] from akka.actor.OneForOneStrategy in application-akka.actor.default-dispatcher-5 - Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
1 error
akka.actor.ActorInitializationException: akka://application/user/notification-actor: exception during creation
...
...
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
These errors hinted that there were issues with Cassandra (as it is on port 9042, the reference to CQLStoreManager and janusgraph).
Troubleshooting Cassandra
Given the logs lead me to Cassandra being part of the issue, I ran the following commands to check connectivity to it.
$ cqlsh
# Output
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Well thats a bit of an issue, seem like I can’t connect to Cassandra. I know that cqlsh should connect and at least throw a banner.
Next I tried the following command to try to get any information about what was happening.
$ nodetool status
# Output
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
So the same thing is happening, there is definitely connectivity issues happening here.
I thought perhaps the cassandra service had not started, but when I checked that I could see that the service was active (exited). That does not look quite right. It should be in an active (running) state.
We can look into the /var/log/cassandra/system.log and see if there are any clues there. The only thing that I could see was that it wasn’t accepting connections and that it was announcing a shutdown.
INFO [StorageServiceShutdownHook] 2020-05-12 07:32:28,676 Server.java:179 - Stop listening for CQL clients
INFO [StorageServiceShutdownHook] 2020-05-12 07:32:28,679 Gossiper.java:1647 - Announcing shutdown
By the time I reached this point, I was really no closer to figuring out what the issue was or how to fix it. There were some guides online about modifying some settings within the cassandra.yaml and cassandra-env.sh files which I tried. I tried multiple combinations of localhost / 127.0.0.1 / serverIP / hostname for the settings that were mentioned and everything still came up short.
Next Step: Reviewing the installation guide
With all those steps covered off, I decided to review my installation step for step with what I documented here and back with the original install notes on TheHive Project github pages
The first step in the process is to install openjdk-8-jre-headless. Given that this was a working installation, I checked what version I had installed.
$ java -version
# Output
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
Version 11! I guess that at some point Java got upgraded, probably though some automatic update.
To switch active Java versions I used the following command and selected java-8-openjdk-amd64
$ sudo update-alternatives --config java
# Output
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number:
I also checked the dpkg logs using cat /var/log/dpkg.log | grep openjdk for good measure which showed and install of openjdk-11-jre-headless:amd64 happened. DoH.
With that issue seemingly fixed I restarted both cassandra and TheHive and we were all up and running again.
sudo service cassandra restart
sudo service thehive restart
Now, I can start on the actual upgrade to RC2 and test out some features.