maildir supports a 'P' flag which denotes the fact that a message has
been 'passed' on (forwarded, bounced). notmuch syncs this to the
'passed' tag.
Per https://tools.ietf.org/html/rfc5788, IMAP has a user-defined flag
(keyword) '$Forwarded' that is supported by many servers and clients
these days. (Technically, one should check for '$Forwarded' in the
server response.)
Restructure mbsync's flag parser to accept keywords (flags starting with
'$') but still bail out on unknown system flags (flags starting with '\').
Support '$Forwarded' as a first keyword since it maps to maildir's 'P'
and needs to be sorted in between the system flags.
Signed-off-by: Michael J Gruber <github@grubix.eu>
Mailbox driver flags are defined in several places. It is essential that
they are kept in sync, so mark them with the same string for easy
grepping with an alerting boiler plate.
Signed-off-by: Michael J Gruber <github@grubix.eu>
empty strings were previously meaningless, and starting with 72c2d695a,
failure to handle them lead to bogus results when the IMAP hierarchy
separator is legitimately empty (when the server genuinely supports none
and none is manually configured). non-null can be asserted more cleanly
than null-or-non-empty, so change the api like that.
incidentally, this also removes the need to work around gcc's bogus
warning in -Os mode.
problem found by "Casper Ti. Vector" <caspervector@gmail.com>
the only legitimate "deviant" UID is zero, meaning "no message". this
can be futher qualified by additional flags in the sync record, rather
than using magic values for the UID. in fact, the zero UID (so far
meaning only "expunged") was already optionally qualifed with "expired".
as a side effect, driver->store_msg() now returns 0 instead of -2 for
unknown UIDs. this was a hack to avoid translating the value later
on, but it made the api horrible, and now it's superflous in the first
place.
do that by wrapping the actual stores into proxies.
the proxy driver's code is auto-generated from function templates, some
parameters, and the declarations of the driver functions themselves.
attempts to do it with CPP macros turned out to be a nightmare.
we need a separate log entry type which does proper mmaxxuid tracking.
while moving code around, this also removes a redundant debug statement.
amends b1842617.
newmaxuid represents the highest UID for which a sync entry was created,
while maxuid represents the end of the range which is guaranteed to have
been propagated. that means that the former needs to be instantly
incremented (and logged), while the latter must not be touched until the
entire new message sync completes. this matters particularly in the case
of resuming an interrupted run, where sync entry creation must resume
exactly where it left off, while loading the box must use the old limit
to ensure that all messages are available for actual propagation.
we've been using indices to separate master/slave state for a long time,
so there is no point in using pairs of matching brackets to signify the
side in the journal. instead, use somewhat descriptive letters (S[een],
F[ind], T[rashed]) and the index itself.
they are derived from srec->status, which is unsigned. for not
understood reasons, the compiler complains only after extending status
to a full unsigned int.
on the way, localize the declarations.
when syncing flags but not re-newing non-fetched messages, there is no
need to query the message size for all messages, as the old ones are
queried only for their flags.
that's what the sources already assumed anyway. size_t is total
overkill, as No Email Ever (TM) will exceed 2GiB.
this also fixes a harmless format string warning in 32 bit builds.
trashing many messages at once inevitably overtaxes m$ exchange, and the
connection breaks. without any progress tracking, it would restart from
scratch each time, which would lead to a) it never finishing and b) many
copies of the messages in the trash.
full transactions as we do for "proper" syncing would be over the top,
as it's not *that* bad if some messages get duplicated in the trash. so
we record only the messages for which trashing completed, thus allowing
some overlap between the attempts.
it is legal for an email system to simply change the case of rfc2822
headers, and at least one imap server apparently does just that.
this would lead to us not finding our own header, which is obviously not
helpful.
REFMAIL: CA+fD2U3hJEszmvwBsXEpTsaWgJ2Dh373mCESM3M0kg3ZwAYjaw@mail.gmail.com
the legacy style is a poorly executed attempt at Maildir++, so introduce
the latter for the sake of completeness. but most users will probably
just want to use subfolders without any additional dots.
- the old meaning of -V[V] was moved to -D{n|N}, as these are really
debugging options.
- don't print the info messages by default; this can be re-enabled with
the -V switch, and is implied by most debug options (it was really
kind of stupid that verbose/debug operation disabled these).
- the sync algo/state debugging can be separately enabled with -Ds now.
propagating many messages from a fast store (typically maildir or a
local IMAP server) to a slow asynchronous store could cause gigabytes of
data being buffered. avoid this by throttling fetches if the target
context reports memory usage above a configurable limit.
REFMAIL: 9737edb14457c71af4ed156c1be0ae59@mpcjanssen.nl
don't try to lock it until we actually read or write it.
the idea is to not fail with SyncState * if we tried to load the state
before selecting a non-existing mailbox. this is ok, because if the
mailbox is missing, we obviously have no sync state pertaining to it,
either.
as a side effect, this allows simplifying an error path.
the seznam.cz IMAP server seems very eager to send UIDNEXT responses
despite not supporting UIDPLUS. this doesn't appear to be a particularly
sensible combination, but it's valid nonetheless.
however, that means that we need to save the UIDNEXT value before we
start storing messages, lest imap_find_new_msgs() will simply overlook
them. we do that outside the driver, in an already present field - this
actually makes the main path more consistent with the journal recovery
path.
analysis by Tomas Tintera <trosos@seznam.cz>.
REFMAIL: 20141220215032.GA10115@kyvadlo.trosos.seznam.cz
... for windows fs compatibility.
the maildir-specific InfoDelimiter inherits the global FieldDelimiter
(which affects SyncState), based on the assumption that if the sync
state is on a windows FS, the mailboxes certainly will be as well, while
the inverse is not necessarily true (when running on unix, anyway).
REFMAIL: <CA+m_8J1ynqAjHRJagvKt9sb31yz047Q7NH-ODRmHOKyfru8vtA@mail.gmail.com>
memcmp() is unfortunately not guaranteed to read forward byte-by-byte,
which means that the clever use as a strncmp() without the pointless
strlen()s is not permitted, and can actually misbehave with
SSE-optimized string functions.
so implement proper equals() and starts_with() functions. as a bonus,
the calls are less cryptic.
a failure here is rather unlikely, but let's be pedantic.
a failure is not fatal (we'll just enter the journal replay path next
time), so only print warnings.
found by coverity.
we would try to print the uids from the non-existing srec of unpaired
messages while preparing expiration.
this would happen only if a) MaxMessages was configured and b) new
messages appeared on the slave but we were not pushing, so it's a bit of
a corner case.
found by coverity.
by putting the message propagation last, d3f634702 uncovered a
long-standing problem: we might have closed the source store before all
messages were propagated from it.
msgs_copied() was not checked at all, and msgs_flags_set() was doing it
wrong (sync_close() was not checked).
instead of trying to fix/extend the msgs_flags_set() model (ref-counting
and cancelation checking in lower-level functions, and return values to
propagate the status), place the refs/derefs around higher-level scopes
and do the checking only there. this is effectively simpler, and does
away with some obscure macros.
as we now don't actually start propagating new messages until all TUIDs
have been generated, it's sufficient to sync just once. this makes it
a cheap operation, so we can do it at SYNC_NORMAL level already.
i.e., move it back. whatever the original reason was, it's now gone.
this order is way more natural, which allows us to remove the osrecadd
and S_DONE hacks.
this helps enormously on the first sync of a 100k message box with a
limit of 1k messages. it also happens to make the syncing idempotent.
in a few conditionals we now explicitly test for max_messages being
enabled, not smaxxuid != 0, as after the initial fetch with no important
messages smaxxuid is zero, but we still have to skip over 99k messages
in the above case.
previous sequence:
examine & propagate new => examine old => propagate old
new sequence:
examine new => examine old => propagate new => propagate old
this alone does not buy us much ...
we can bump the internal variable whereever convenient, but we cannot
log it until we know that all messages were copied, as otherwise we
could miss some new messages after an interruption. with the new
approach, interruption would merely cause some additonal traffic.
less code duplication, more logical order of issued driver commands
(especially after the next commit), and the "side effect" of letting the
message expiration code see those deletions if they are asynchronous.
the delay optimized the corner case of previously important but now
expired messages on the slave disappearing, either through an external
expunge or after a journal replay. no point in pessimizing the common
case.
the removed code would only ever trigger if a) we were after a journal
replay or b) something external expunged the expired messages - both are
corner cases not worth the extra code.
however, this means that the syncing code further down now needs to take
care of these zombies.
in the end, the normal cleanup will take care of all expired entries,
new and old.