By Adrian | May 12, 2020
I was so excited at the thought of all the cool new features that have popped up in TheHive v4.0.0-RC2 that I went straight onto my lab to give it a spin. Little did I know that my system was broken before I even started and I spent the best part of a few hours trying to figure out what exactly happened. For a brief moment I did consider burning the lab down and just rebuilding it, but I asked myself what would happen if this were a prod system?
and with that thought I persisted to root cause.
TLDR: Java.
So what happened
When I went to upgrade my instance the first thing I did was to check the status of TheHive service, before I was going to shut it down pre upgrade.
$ sudo service thehive status
# Output
● thehive.service - Scalable, Open Source and Free Security Incident Response Solutions
Loaded: loaded (/usr/lib/systemd/system/thehive.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-05-12 07:34:36 UTC; 51s ago
Docs: https://thehive-project.org
Process: 4364 ExecStart=/opt/thehive/bin/thehive -Dconfig.file=/etc/thehive/application.conf -Dlogger.file=/etc/thehive/logback.xml -Dpidfile.path=/dev/null (code=exited, status=255
Main PID: 4364 (code=exited, status=255)
May 12 07:34:27 thehive4 systemd[1]: Started Scalable, Open Source and Free Security Incident Response Solutions.
May 12 07:34:36 thehive4 systemd[1]: thehive.service: Main process exited, code=exited, status=255/n/a
May 12 07:34:36 thehive4 systemd[1]: thehive.service: Failed with result 'exit-code'.
lines 1-10/10 (END)
Seems the service had crashed and couldn’t start successfully (code=exited, status=255
). Naturally I went straight for /var/log/thehive/application.conf
for clues. The errors that java pumps out is enough to make you puke, but i’ve pulled out the relevant lines:
2020-05-12 10:19:29,243 [ERROR] from akka.actor.OneForOneStrategy in application-akka.actor.default-dispatcher-5 - Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
1 error
akka.actor.ActorInitializationException: akka://application/user/notification-actor: exception during creation
...
...
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
...
...
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
These errors hinted that there were issues with Cassandra
(as it is on port 9042
, the reference to CQLStoreManager
and janusgraph
).
Troubleshooting Cassandra
Given the logs lead me to Cassandra
being part of the issue, I ran the following commands to check connectivity to it.
$ cqlsh
# Output
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Well thats a bit of an issue, seem like I can’t connect to Cassandra
. I know that cqlsh
should connect and at least throw a banner.
Next I tried the following command to try to get any information about what was happening.
$ nodetool status
# Output
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
So the same thing is happening, there is definitely connectivity issues happening here.
I thought perhaps the cassandra
service had not started, but when I checked that I could see that the service was active (exited)
. That does not look quite right. It should be in an active (running)
state.
We can look into the /var/log/cassandra/system.log
and see if there are any clues there. The only thing that I could see was that it wasn’t accepting connections and that it was announcing a shutdown.
INFO [StorageServiceShutdownHook] 2020-05-12 07:32:28,676 Server.java:179 - Stop listening for CQL clients
INFO [StorageServiceShutdownHook] 2020-05-12 07:32:28,679 Gossiper.java:1647 - Announcing shutdown
By the time I reached this point, I was really no closer to figuring out what the issue was or how to fix it. There were some guides online about modifying some settings within the cassandra.yaml
and cassandra-env.sh
files which I tried. I tried multiple combinations of localhost
/ 127.0.0.1
/ serverIP
/ hostname
for the settings that were mentioned and everything still came up short.
Next Step: Reviewing the installation guide
With all those steps covered off, I decided to review my installation step for step with what I documented here and back with the original install notes on TheHive Project github pages
The first step in the process is to install openjdk-8-jre-headless
. Given that this was a working installation, I checked what version I had installed.
$ java -version
# Output
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
Version 11! I guess that at some point Java got upgraded, probably though some automatic update.
To switch active Java versions I used the following command and selected java-8-openjdk-amd64
$ sudo update-alternatives --config java
# Output
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number:
I also checked the dpkg logs using cat /var/log/dpkg.log | grep openjdk
for good measure which showed and install of openjdk-11-jre-headless:amd64
happened. DoH.
With that issue seemingly fixed I restarted both cassandra
and TheHive
and we were all up and running again.
sudo service cassandra restart
sudo service thehive restart
Now, I can start on the actual upgrade to RC2 and test out some features.