it helps if the code actually does what the comment above it claims.
clarify it a bit, so i don't get stupid ideas again.
This reverts commit cf6a7b4d18.
propagating many messages from a fast store (typically maildir or a
local IMAP server) to a slow asynchronous store could cause gigabytes of
data being buffered. avoid this by throttling fetches if the target
context reports memory usage above a configurable limit.
REFMAIL: 9737edb14457c71af4ed156c1be0ae59@mpcjanssen.nl
some servers actually bother to close down the SSL connection before
closing the socket.
this fixes the spurious "unhandled SSL error 6" messages.
REFMAIL: 20150120114805.GA17586@leeloo.kyriasis.com
the server can actually close the zlib stream before closing the socket,
so we need to accept it.
we don't do anything beyond that - the actual EOF will be signaled by
the socket, and if the server (erroneously) sends more data, zlib will
tell us about it.
REFMAIL: 1423048708-975-1-git-send-email-alex.bennee@linaro.org
zlib reports Z_BUF_ERROR when a flush is attempted without any activity
since the previous flush (if any). while this is harmless as such,
discerning the condition from genuine errors would be much harder than
avoiding the pointless flush in the first place.
REFMAIL: eb5681612f17be777bc8d138d31dd6d6@mpcjanssen.nl
don't retry dead Stores for every Channel.
this also introduces a state for transient errors (specifically, connect
failures), but this is currently unused.
a directory is no mailbox unless it contains a cur/ subdir.
but if that one is present, create new/ and tmp/ if they are missing.
this makes it possible to resume interrupted maildir creations.
don't try to lock it until we actually read or write it.
the idea is to not fail with SyncState * if we tried to load the state
before selecting a non-existing mailbox. this is ok, because if the
mailbox is missing, we obviously have no sync state pertaining to it,
either.
as a side effect, this allows simplifying an error path.
when LITERAL+ is used, the server has no chance for early rejection of
messages. this means that the client can upload megabytes for nothing.
so simply don't use LITERAL+ for big messages. of course this adds
server roundtrips, but that's tough luck.
the limit could be arguably higher than 100k (or even configurable).
i set it to ~2 sec with my fairly average DSL line.
the primary objective is reducing the number of small SSL packets (which
are always padded), but fewer syscalls in the non-SSL case should be
good as well.
instead of keeping the structures in an opaque array (which was a shadow
of the struct pollfd array if poll() was supported), make them directly
addressable.
this has the advantage that notifier-altering operations (mostly
en-/disabling) don't need to look up the structure by file handle each
time.
on the downside, data locality in the main loop is worse.
neither of these have any real effect on performance.
note that the structures are not allocated separately, but embedded into
the the parent structure (like sockets already were).
the seznam.cz IMAP server seems very eager to send UIDNEXT responses
despite not supporting UIDPLUS. this doesn't appear to be a particularly
sensible combination, but it's valid nonetheless.
however, that means that we need to save the UIDNEXT value before we
start storing messages, lest imap_find_new_msgs() will simply overlook
them. we do that outside the driver, in an already present field - this
actually makes the main path more consistent with the journal recovery
path.
analysis by Tomas Tintera <trosos@seznam.cz>.
REFMAIL: 20141220215032.GA10115@kyvadlo.trosos.seznam.cz
we can't leave the store FRESH, as otherwise the error handling code
will assume it is still being opened and will return to the main loop.
depending on the config this would cause an immediate termination or an
indefinite wait.
this re-introduces 6741bc94 (just a bit differently), thus effectively
reverting fbfcfea5. i suppose this extra CRLF is needed by a broken
CRAM-MD5 implementation of some server, which is why it was there in the
original implementation as well. however, it breaks more pedantic
non-broken servers. if somebody complains, we'll need to add a much
more sophisticated hack.
we should make no assumptions about the layout of OpenSSL's certificate
store, in particular when they are wrong. so copy the interesting part
instead of "deep-linking" into it.
this code is uglier than it should be, but OpenSSL's extensive use of
macros to manage data types would force us to include the ssl headers
into our headers otherwise, which would be even uglier.
REFMAIL: <545442CC.9020400@nodivisions.com>
... for windows fs compatibility.
the maildir-specific InfoDelimiter inherits the global FieldDelimiter
(which affects SyncState), based on the assumption that if the sync
state is on a windows FS, the mailboxes certainly will be as well, while
the inverse is not necessarily true (when running on unix, anyway).
REFMAIL: <CA+m_8J1ynqAjHRJagvKt9sb31yz047Q7NH-ODRmHOKyfru8vtA@mail.gmail.com>
patch initially by Jack Stone <jwjstone@fastmail.fm>,
cleaned up by Jan Synacek <jsynacek@redhat.com>,
... and then almost completely rewritten by me. ^^
RequireCRAM (another fairly stupid "use if available" option) is now
deprecated. instead, the AuthMech option can be used to give a precise
list of acceptable authentication mechanisms (which is currently "a bit"
short). in particular, this allows *not* using CRAM-MD5 even if it's
available.
instead of using a callback which messes with the certificate chain
verification, simply let OpenSSL ignore errors during that phase and
check the result only afterwards.
the combinations of the various options made quite a mess. additionally,
'RequireSSL no' is inherently insecure - "use SSL if available" is plain
stupid.
the old options are still accepted, but will elicit a warning.
it doesn't belong there - it's a property of imap_server_conf_t.
the port setup is now done while reading the config.
this makes socket.[hc] imap-agnostic.
memcmp() is unfortunately not guaranteed to read forward byte-by-byte,
which means that the clever use as a strncmp() without the pointless
strlen()s is not permitted, and can actually misbehave with
SSE-optimized string functions.
so implement proper equals() and starts_with() functions. as a bonus,
the calls are less cryptic.
such connections don't support STARTTLS. that is reasonable, as whatever
makes the connection preauthenticated (typically a Tunnel used to launch
imapd via a shell login) must already rely on the connection's security.
consequently, we would not try to use STARTTLS with such connections.
unfortunately, we'd also skip the RequireSSL check as a side effect.
this means that a rogue server (via a MITM attack) could simply offer a
preauthenticated connection to make us not use SSL, and thus bypass
server authentication. as a result, we could send potentially sensitive
data to the attacker:
- with Patterns used, we would send a LIST command which reveals the
remote Path setting. this isn't very useful to an attacker. also, IMAP
Accounts usually rely on the server-provided NAMESPACE to start with.
- with Create enabled for the remote Store, we would upload messages
from newly appeared local folders. this isn't a very likely situation,
unless the attacker manages to convince the victim to move/copy
interesting mails to a new folder right before the attack.
- with Expunge enabled for the local Store, previously synchronized
folders would be wiped. however, this would require the attacker to
know the correct UIDVALIDITY of each remote folder, which would
require incredible luck or convincing the victim to disclose them.
the first mismatch would likely tip off the victim.
in practice, someone with the level of technical and social engineering
skills required for this attack would very likely find more attractive
attack vectors. therefore, i don't consider this a particularly serious
issue.
configurations with UseIMAPS enabled or using a secure Tunnel were not
affected to start with.
a side effect of this fix is that most users of Tunnel will now need to
explicitly set RequireSSL to false.
an alternative approach would be defaulting all SSL-related settings to
off when Tunnel is used. this would be too invasive for a patch release,
but i'll consider it for 1.2.
see also CVE-2014-2567 for the Trojita MUA.
the highest assigned UID must always be at least as high as the highest
actually found UID, as otherwise we'd hand out duplicate UIDs at some
point. also, getting into such a state in the first place indicates some
potentially serious trouble, or at least external interference (e.g.,
moving/copying a message from another folder without giving it a
pristine filename).
REFMAIL: 20140626211831.GA11590@sie.protva.ru
unlike the isync wrapper, mbsync does not have a default for the IMAP
user. the remote user seldomly matches the local one, so "forwarding" it
is more confusing than helpful.
CCMAIL: 744389@bugs.debian.org
a failure here is rather unlikely, but let's be pedantic.
a failure is not fatal (we'll just enter the journal replay path next
time), so only print warnings.
found by coverity.
the code was copied and the original adjusted ... but not quite
completely.
this means that clashing server names never really worked since - not
that i would have expected this to be a particularly common
configuration to start with. :D
also added comments explaining why there are two implementations of the
same thing.
amends aea4be19e3 (anno 2006).
found by coverity.
we would try to print the uids from the non-existing srec of unpaired
messages while preparing expiration.
this would happen only if a) MaxMessages was configured and b) new
messages appeared on the slave but we were not pushing, so it's a bit of
a corner case.
found by coverity.
this would happen in the absurd corner case that the response code is
properly terminated with a closing bracket, but the atom itself is an
unterminated double-quoted string.
NOT found by coverity.
if something managed to make the maildir .uidvalidity files big enough
(possible only by appending garbage or scrambling them alltogether), we
would overflow the read buffer by one when appending the terminating
null.
this is not expected to have any real-world impact.
found by coverity.
for some reason lost in history, the prime_deltas were actually wrong,
leading to using composite numbers.
the right sequence is available at http://oeis.org/A092131.
the trivial approach of having "home" and "root" stores produced ugly
results, and totally failed with the introduction of nested folder
handling.
instead, create a store per local directory, just as one would manually.
CCMAIL: 737708@bugs.debian.org
makes for leaner Channel sections.
note: the global delete and expunge variables exist so the command line
can override the config file despite the otherwise backwards behavior.
the BODY[] item in the FETCH response corresponds to what we requested,
and its presence doesn't imply that it actually contains anything useful
- new messages may appear in the mailbox in addition to those we stored
ourselves, and these will obviously have no TUID.
the global timezone variable is glibc-specific.
so use timegm() instead of mktime() for the conversion.
as that is specific to the BSDs and glibc, provide a fallback.
amends 62a6099.
by putting the message propagation last, d3f634702 uncovered a
long-standing problem: we might have closed the source store before all
messages were propagated from it.
msgs_copied() was not checked at all, and msgs_flags_set() was doing it
wrong (sync_close() was not checked).
instead of trying to fix/extend the msgs_flags_set() model (ref-counting
and cancelation checking in lower-level functions, and return values to
propagate the status), place the refs/derefs around higher-level scopes
and do the checking only there. this is effectively simpler, and does
away with some obscure macros.
as the named boxes are the same on both sides, they logically make
sense only when the channel is in that mode anyway, which is the case
when using patterns.
we would see the recent timestamp of the creation and conclude that
something is going on, so we'd wait. this is obviously nonsense.
as we know that a freshly created mailbox is empty, simply skip the
message scan alltogether.
as we now don't actually start propagating new messages until all TUIDs
have been generated, it's sufficient to sync just once. this makes it
a cheap operation, so we can do it at SYNC_NORMAL level already.
sneaky change on the side: the wording of the man page is changed from
"outside any section" to "before any section" to get global options.
this is not entirely true ... the previously existing options behave as
before, while the two newcomers actually affect subsequent channels.
i.e., move it back. whatever the original reason was, it's now gone.
this order is way more natural, which allows us to remove the osrecadd
and S_DONE hacks.
this helps enormously on the first sync of a 100k message box with a
limit of 1k messages. it also happens to make the syncing idempotent.
in a few conditionals we now explicitly test for max_messages being
enabled, not smaxxuid != 0, as after the initial fetch with no important
messages smaxxuid is zero, but we still have to skip over 99k messages
in the above case.
previous sequence:
examine & propagate new => examine old => propagate old
new sequence:
examine new => examine old => propagate new => propagate old
this alone does not buy us much ...
we can bump the internal variable whereever convenient, but we cannot
log it until we know that all messages were copied, as otherwise we
could miss some new messages after an interruption. with the new
approach, interruption would merely cause some additonal traffic.
less code duplication, more logical order of issued driver commands
(especially after the next commit), and the "side effect" of letting the
message expiration code see those deletions if they are asynchronous.
the delay optimized the corner case of previously important but now
expired messages on the slave disappearing, either through an external
expunge or after a journal replay. no point in pessimizing the common
case.
the removed code would only ever trigger if a) we were after a journal
replay or b) something external expunged the expired messages - both are
corner cases not worth the extra code.
however, this means that the syncing code further down now needs to take
care of these zombies.
in the end, the normal cleanup will take care of all expired entries,
new and old.
that is, don't count them towards the total only below the cut-off
point. making them extend the working set even though they are inside it
is counterintuitive.
while maildir has a clearly defined meaning of "recent" and for example
mutt handles it graciously, IMAP's definition is fubared to the point
that some servers (for example gmail) simply refuse to support it.
for symmetry reasons it is best to pretend that it doesn't exist at all.
it doesn't seem too useful anyway (the user can simply mark the messages
as read to allow pruning).
and last but not least, the man page of mbsync says nothing about
"recent", only "unread". unlike the isync man page, though.
even if we are not propagating new messages, the appearance of new
messages on the slave can lead to expiring older messages. for that, we
need to know their importance, and thus flags.
the alternative would be not doing an expiration run when not fetching
new messages, but that would mean more conditionals all over the place.
as the decision is somewhat arbitrary, just do the simpler thing.
the header is not space-critical, so use proper name-value pairs.
this has the additional advantage that subsequent format changes can be
done much easier.
otherwise we would propagate phantom deletions.
this affected only sync runs after an interruption while storing
messages, so it went (mostly?) unnoticed.
the warning suppression pragma within function scope is apparently a new
thing.
as i don't want to disable the check for the entire function (even if
this currently would make no difference), just use a wrapper function
to suppress the format string check.
amends 9c86ec344.
S_FIND was for the sync record status field. it has no business in the
sync vars status fields. its value coincided with ST_SELECTED, which
luckily only means that we always tried to match up TUIDs even if there
was nothing to do.
the need for TUID matching arises in two mostly independent
circumstances, so add two separate flags ST_FIND_{OLD,NEW}.
this would happen if we were trying to find newly pushed messages, but
none actually arrived.
as imap's ranges are not ordered, this would actually fetch one message.
this value is only ever used to find just pushed messages by TUID, so we
can simply use the UIDNEXT value from before we started pushing - and of
course, we need to record that in the journal. it makes no sense to log
the new value after completing a search, as there won't be a next search
before we push the next messages.
the purpose of this variable is to hold the UIDNEXT value from before
we started pushing new messages, i.e., the minimal uid we can expect
them to have.
the test suite actually relies on it. it would be possible to adjust it,
but there is not much reason to make paths relative to HOME (as we
support convenient tilde expansion). so use the least invasive approach,
which is simply the old behavior. adjust the documentation accordingly.
This reverts commit da5ce5d8f4.
always use getsockopt() to query the meaning of POLLERR, rather than
reporting "Unidentified socket error".
this is unlikely to have any effect when using select(), as that one
pretty much never signals exceptional conditions.
turns out that poll() may (and on linux does) signal POLLERR on
connection failure. this is unlike select(), which is specified to
signal write readiness in every case.
consequently, check whether we are connecting before checking for
POLLERR.
time_t may be long long. to keep the sprintf format strings simple, just
downcast - this is not going to be a problem for the next 30 years, and
until then long will be 64-bit everywhere anyway.
suggested 3.5 years ago by Antoine Reilles <tonio@NetBSD.org>.
leave all the hard work to OpenSSL. this has several consequences:
- certificate chain validation actually works instead of throwing
around error 20
- the interactive approval is gone. i don't expect it to be useful
anyway, as mbsync is mostly a batch tool
- the code is much shorter
we did not check a valid certificate's subject at all so far.
this is no problem if the certificate file contains only exactly the
wanted host's certificate - before revision 04fdf7d1 (dec 2000, < v0.4),
this was even enforced (more or less - if the peer cert had been
signed directly by a root cert, it would be accepted as well).
however, when the file contains root certificates (like the system-wide
certificate file typically does), any host with a valid certificate
could pretend to be the wanted host.
fdatasync() the journal after creating the pair record and recording
the TUID, but before the message propagation actually starts.
all other writes to the journal are not flushed, as they will at worst
cause some unnecessary network traffic without visible effect.
this fixes two possible failure scenarios:
- if the journal is committed but the mails are not, the missing files
would be erroneously interpreted as deletions which would be
propagated
- less seriously, if the mail files' meta data was committed but the
file contents were not, we would end up with empty files, which would
have to be re-fetched "behind mbsync's back" (just deleting the files
would not work - see above)
make sure that the new state is committed to disk before overwriting the
old version - by default meta data is committed first, so we may end up
with no valid state at all otherwise.
this removes the pathological O(<number of sync records> * <number of
new messages>) case at the cost of being a bit more cpu-intensive (but
O(<number of all messages>)) for old messages.
when we find that the store is incompatible with in-store sync state,
we want to fail the whole channel. however, we must not claim that the
store died, otherwise it won't be disposed of properly.
pass DB_TRUNCATE when creating databases. otherwise bdb will complain
about the empty file we pass it (we have to create it upfront to
implement our locking).
in fact, UIDNEXT (and UIDVALIDITY) null is *not* allowed (see RFC3501
section 9). them popping up nonetheless was a dovecot bug (which would
also confuse dovecot itself).
having it in as a workaround was no good either, as quite some other
code in mbsync assumes that UIDs are not null.
This reverts commit e1fa867 and most of 39006d7.
-REFMAIL: 4CA62BA1.4020104@lemma.co.uk
files may be renamed (due to new -> cur transition or flag changes),
which may lead to two effects if ignored:
- we see both the old and the new name, so we report a spurious
duplicate UID
- we see neither name, so we report a spurious deletion
as countermeasure, record and compare directory modification times. upon
mismatch, we just start over - as usual.
don't try to unlock and close databases and files - this will happen a
moment later anyway, through cancelation or re-selection.
ironically, this plugs a memory leak, because an open main database is
used as a signal to close a temporary db in maildir_scan().
instead of SEARCHing every single message (which is slow and happens to
be unreliabe with M$ Exchange 2010), just FETCH the new messages from
the mailbox - the ones we just appended will be amongst them.
unless an info message is explictly marked as a continuation, it must
terminate any pending line (typically the progress information) first.
debug output is not affected, as it is mutually exclusive with info
output, and no debug lines are left unterminated outside clear scopes.
- introduce sys_error() and use it instead of perror() and
error(strerror()) in all expected error conditions
- perror() is used only for "something's really wrong with the system"
kind of errors
- file names, etc. are quoted if they are not validated yet, so e.g. an
empty string becomes immediately obvious
- improve and unify language
- add missing newlines
- asynchronous sockets using an event loop
- connect & starttls have completion callback parameters
- callbacks for notification about filled input buffer and emptied
output buffer
- unsent imap command queue
- used when
- socket output buffer is non-empty
- number of commands in flight exceeds limit
- last sent command requires round-trip
- command has a dependency on completion of previous command
- trashnc is tri-state so only a single "scout" trash APPEND/COPY is
sent at first. a possibly resulting CREATE is injected in front of
the remaining trash commands, so they can succeed (or be cancel()d
if it fails).
- queue's presence necessitates imap_cancel implementation
this prepares the code for being called from a callback.
notably, this makes the imap list parser have a "soft stack", so the
recursion can be suspended at any time.
instead of returning a write()-like result, return only a binary status
code - write errors are handled internally anyway, so user code doesn't
have to check the write length.
this makes the IMAP command submission interface asynchronous.
the functions still have synchronous return codes as well - this enables
clean error return paths. only when we invoke callbacks we resort to
refcounting.
as a "side effect", properly sequence commands after CREATE resulting
from [TRYCREATE].
synchronous error codes which are passed through callbacks aren't a
particularly good idea, after all: latest when the callback does stuff
which does not concern the caller, the return code becomes ambiguous.
instead, protect the sync_vars object with a refcount when invoking
driver functions from loops, as the callbacks they call could invalidate
the object and we would have no way of knowing that the loop should be
aborted prematurely. the upcoming async imap driver will also need a
refcount to protect the cancelation marker of the imap socket dispatcher
loop.
that way we don't have to piggy-back (possibly asynchronous) fatal
errors to particular commands.
internally, the drivers still use synchronous return values as well,
so they don't try to access the invalidated store after calling back.
just use the presence of an SSL object as an indicator. if something
goes wrong during the ssl handshake or certificate validation, the
socket must be immediately closed anyway.
don't pretend that the server has no literal+ for the time of the
first relevant command's synchronous execution. instead, enable the
lower layer to do the processing by telling it for which commands
trashnc ("trash's existence not confirmed") is relevant.
we always actually open the mailbox before appending to it, so we
obviously know that it exists - that's why the code was already
commented out. changing this assumption would significantly complicate
matters for little gain, so let's just assume it won't happen.
consequently, also don't set param.create when appending to regular
mailboxes.
- don't silently fail in release mode (expression with side effects
inside assert())
- save some redundand strlen()s by not throwing away known lengths
- reorganize the code for legibility
if the header contained no CRs but the body (or the post-TUID part of
the header) did, the TUID insertion would add an excess CR, thus
overflowing the buffer by one byte.
we'd send a LOGOUT command in plain text while the server was already
expecting an encrypted command, which would typically lead to waiting
for more data and thus an indefinite hang.
so close the socket immediately instead of letting the normal shutdown
path take care of it.
inspired by a patch by Steven Flintham.
-REFMAIL: 4C9AB98E.3000400@lemma.co.uk
this is basically a security fix for nonsensical configurations:
if the specified CertificateFile did not contain any certificates,
we *might* have accepted an arbitrary server certificate.
imap may very well store messages with LF line endings. only RFC2822
requires CRLF.
consequently, preserve the line endings as much as possible unless the
mailbox format does not support it (this would be the case for unix mbox
- i actually have no idea about maildir).
a bit ugly for the "SyncState *" case, as we have to create a directory
without making it a maildir right away. however, this makes the code
quite a bit simpler to understand and simpler to parallelize.