Mailing list archive seems to not update its search index #13355

Closed
opened 2026-05-19 10:43:14 +00:00 by jwakely · 5 comments

Description of request

Is the devel mailing list search not updating some index?

This search for "gcc snapshots in Koji and copr" is the exact title of a thread from 7 days ago, and it only finds a load of results from 3+ months ago: https://lists.fedoraproject.org/archives/search?q=gcc+snapshots+in+Koji+and+copr&page=1&mlist=devel%40lists.fedoraproject.org&sort=date-desc

The thread is present in the archive it's just not searchable.

### Description of request Is the `devel` mailing list search not updating some index? This search for "gcc snapshots in Koji and copr" is the exact title of a thread from 7 days ago, and it only finds a load of results from 3+ months ago: https://lists.fedoraproject.org/archives/search?q=gcc+snapshots+in+Koji+and+copr&page=1&mlist=devel%40lists.fedoraproject.org&sort=date-desc The thread is [present in the archive](https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/VHPYUVPYKT66JXBQAYX3OBDESS54VYPI/) it's just not searchable.
Owner

So I checked out the mailman and the index of the list should be updated every hour and every month as well.

So I'm trying to generate the index for devel mailing list manually and there is around 333877 mails to be indexed. It's possible that the indexing library has issue with that big mailing list. I will wait for it to finish and see if there will be any error.

So I checked out the mailman and the index of the list should be updated every hour and every month as well. So I'm trying to generate the index for `devel` mailing list manually and there is around 333877 mails to be indexed. It's possible that the indexing library has issue with that big mailing list. I will wait for it to finish and see if there will be any error.
Owner

Running the indexing manually for the list fixed the index.

But looking at the log of indexer I found this:

May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: [ERROR/MainProcess] Failed indexing 1 - 1000 (retry 5/5): Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\', (pid 1099315): Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\',
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: Traceback (most recent call last):
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:   File "/usr/lib/python3.9/site-packages/haystack/management/commands/update_index.py", line 119, in do_update
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:     backend.update(index, current_qs, commit=commit)
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:   File "/usr/lib/python3.9/site-packages/xapian_backend.py", line 94, in wrapper
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:     func(self, *args, **kwargs)
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:   File "/usr/lib/python3.9/site-packages/xapian_backend.py", line 501, in update
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]:     database.replace_document(document_id, document)
May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: xapian.InvalidArgumentError: Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\',

So I will try to find what is causing this and try to fix it.

Running the indexing manually for the list fixed the index. But looking at the log of indexer I found this: ``` May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: [ERROR/MainProcess] Failed indexing 1 - 1000 (retry 5/5): Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\', (pid 1099315): Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\', May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: Traceback (most recent call last): May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: File "/usr/lib/python3.9/site-packages/haystack/management/commands/update_index.py", line 119, in do_update May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: backend.update(index, current_qs, commit=commit) May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: File "/usr/lib/python3.9/site-packages/xapian_backend.py", line 94, in wrapper May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: func(self, *args, **kwargs) May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: File "/usr/lib/python3.9/site-packages/xapian_backend.py", line 501, in update May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: database.replace_document(document_id, document) May 20 08:07:24 mailman01.rdu3.fedoraproject.org mailman-web[1099315]: xapian.InvalidArgumentError: Term too long (> 245): XTEXT\'identifier\'=\'s1ev2l\'}#.dir/\'mediatic\'#.dir/@{\'name\'=\'visual\'}#.dir/@{\'name\'=\'2d\'}#.dir/@{\'name\'=\'pictography\'}#.dir/@(\'reality\', ``` So I will try to find what is causing this and try to fix it.
Owner

After some digging I found out what is causing this. It's an issue in xapian_haystack, the indexing fails when there is e-mail that is too long for it. And as the hourly indexing is trying to index all new e-mails it fails every time. I assume it started failing when it got to a mailing list with the mail that causes this issue during indexing.

I will introduce patch to xapian_haystack, that should just skip those mails during processing, till this is resolved upstream.

After some digging I found out what is causing this. It's an [issue in xapian_haystack](https://github.com/notanumber/xapian-haystack/issues/77), the indexing fails when there is e-mail that is too long for it. And as the hourly indexing is trying to index all new e-mails it fails every time. I assume it started failing when it got to a mailing list with the mail that causes this issue during indexing. I will introduce patch to `xapian_haystack`, that should just skip those mails during processing, till this is resolved upstream.
Owner

The patch is now deployed on the machine and I'm manually re-indexing all the mailing lists again, so we have everything in place. I will close this once the re-indexing is finished, so all the lists are fixed. Anyway the devel mailing list index is fixed.

The patch is now deployed on the machine and I'm manually re-indexing all the mailing lists again, so we have everything in place. I will close this once the re-indexing is finished, so all the lists are fixed. Anyway the `devel` mailing list index is fixed.
Owner

The re-indexing is now finished. Closing this as done.

The re-indexing is now finished. Closing this as done.
Sign in to join this conversation.
No milestone
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
infra/tickets#13355
No description provided.