* Remove the pid file if runtime errors occur
* Clean up error handling and fix pid file creation
The pid file was being created before evaluating the args, now it may
happen that incorrect args or --help was passed: in that event, the pid
file remains created. This was also fixed, besides some refactoring.
* Deter other processes from using the same data dir
For more information, see #167
* Don't lock `pid_file`
Windows has mandatory locking so second instance won't be able to read
the PID of the other process. We'll just keep the file descriptor/handle
open
This is very useful because it removes the need for user intervention in
the event save on termination fails. Say the save operation fails due to
'some bad daemon' changing the directory's perms. Now skyd reports this
error while trying to save upon termination. Our sysadmin now fixes the
perms issue. The previous design would force the sysadmin to _somehow_
foreground skyd and hit enter. That is silly. The new design just
attempts to do a save operation every 10 seconds. So in case the issue
is fixed, the save operation will recover on its own.
Why not exponential backoff?
That's because the issue can be fixed some long time later and we may
have reached a large backoff value so the save that could have succeeded
would have to wait for a long duration before it can do anything
meaningful.
This also fixes a bug that caused BGSAVE errors to be reported as info
class log entries.
* Explicitly fsync and relax CPU on snap busy-loop
This commit also switches to using global `VERSION` and `URL` statics
than defining it per-crate.
* Add changelog entry and bump up version
* Optimize `dbtest` macro and rm redundant allocs
* Upgrade deps
* Create a new file on writing to flock-ed file
This fix is a very important one in two ways. Say we have an user A.
They go ahead and launch skyd. skyd creates a data.bin file. Now A just
deletes the data.bin file for fun. Funny enough, this never causes flock
to error!
Why? Well because the descriptor/handle is still valid and was just
unlinked from the current directory. But this might seem silly since
the user exits with a 'successfully saved notice' only to find that the
file never existed and all of their data was lost. That's bad.
There's a hidden problem in our current approach too, apart from this.
Our writing process begins by truncating the old file and then writing
to it by placing the cursor at 0. Nice, but what if this operation just
crashes. So we lost the current data AND the old data. Not good.
This commit does a better thing: it creates a new temporary file, locks
it before writing and then flushes the current data to the temporary
file. Once that succeeds, it replaces the old data.bin file with the
newly created file.
This solves both the problems mentioned here for us:
1. No more of the silly error
2. If BGSAVE crashes in between, we can be sure that at least the last
data.bin file is in proper shape and not half truncated or so.
This commit further moves the background services into their
own module(s) for easy management.
* Fix CI scripts
Fixes:
1. Our custom runner (drone/.ci.yml) was modified to kill the skyd
process once done since this pipeline is not ephemeral.
2. GHA for some reason ignores any error in the test step and proceeds
to kill the skyd process without erroring. Since GHA runners are
ephemeral, we don't need to do this manually.
What we did in the old implementation was pure over-engineering.
We relied on CoreDB's `Drop` impl to terminate the background services.
Now this is absolutely unreliable due to the nature of async functions.
We also relied on the bgsave scheduler to release the lock upon exit
which is also unreliable because we left the service to the mercy of the
runtime. We spawned the task and didn't hold as much as a `JoinHandle`
to it. That's bad because the runtime can just abort these tasks which
may result in the lock never being released. Even though it is designed
to release the lock on Drop, the destructor may however not be called at
all.
This commit fixes all those issues by simplifying the entire impl to
use Terminator. Now the background save and snapshot services run
independently, in their own tasks. Whenever the user passes a SIGINT,
we tell everyone to quit. The listeners understand that this is the
last query they'll process and the background save tasks exit almost
immediately. But what if some data was modified by this last query...?
No worries, that is completely handled by main(). The lock that BGSAVE
leaves is immediately (almost) returned to main and main will attempt
to flush the data almost immediately. That's how we maintain reliability
This commit adds changes so that the main process almost immediately
acquires a lock on the data file when runtime is dropped. This is just
an added precaution to try and ensure that no other process does
something silly with the data file.
The descriptor is cloned for this using `FileLock::try_clone`
8e46e62 added a block_on_process_exit function that kept on sending
`notify_one()`s in a loop until the services terminated. This was
pointless as the `Drop` impl would do it for us anyways.
(What was I thinking?)
So, in main(), we're spawning an async task that lets the DB run as long
as we don't pass a ctrl_c (or some bad panic occurs). Once the ctrl_c
is received, we start terminating all workers. `block_on` returns DB
which should be the only one holding an atomic reference to the shared
field. We assert this right after dropping `runtime`.
Finally, the ECONNRESET suppression match was fixed to remove an
unreachable branch by adding conditional compilation
This commit ensures that the workers exit before attempting a flush_db
operation. Only after block_on_process_exit finishes we return `db`.
Now we run a simple flush_db operation knowing that the lock has been
released.
To block on process termination, we introduce a new function
block_on_process_exit that does the same thing as CoreDB's Drop
implementation.
Also, dependencies were upgraded across all crates and the version for
`tdb-macros` was streamlined to 0.5.0 like the other crates.
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
This commit adds a basic SSL/TLS listener using `openssl`.
The `SslListener` object can accept a connection and get a decrypted
stream.
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
Since creating snapshots is quite an important utility,
there may be scenarios where creating one may be needed,
even if it is disabled on the server side. This commit
enables such snapshots to be created. This is achieved by
enabling MKSNAP to accept two arguments, where a 'named'
snapshot can be created, which is our "special" snapshot.
All these "special" snapshots are stored in a separate
"snapshots/remote" dir that is ignored by the
`SnapshotEngine`.
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
Until now, the database server could only be configured via the
configuration file. This commit enables the host, port and noart
options to be configured via command-line arguments.
This is important as there may be scenarios where creating a file
presents a challenge to the user.
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
The user can now run `tdb -r <snapshotname>` to restore data from the
snapshot. Also, we'll show a note in the logs when trying to restore from
a snapshot
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
We don't need tests for MKSNAP when it is enabled as we already have
tests for snapshotting in `diskstore::snapshot`
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>
In `cli` other errors are now formatted in a `[ERR]` format
Also the documentation across the project was updated
Signed-off-by: Sayan Nandan <nandansayan@outlook.com>