During an index from the command line
When executing ./occ fulltextsearch:index:
-
For each content provider from your setup and for each user, FullTextSearch will ask the content provider for a list of documents (Model/IndexDocument[]) that belongs to the current user. The object IndexDocument will only contains minimal/enough information about the document like its Id (so the list of document takes minimum memory)
-
FullTextSearch gets the list of known Model/Index[] from database and compare their status with the list of documents returned by the content provider. A new list will be created containing only documents that are not in the database or documents which the date of the last index is anterior to the date of last modification.
-
FullTextSearch picks the first Model/IndexDocument from the list, and assign its relative Model/Index
-
The content provider fill the Model/IndexDocument with more data, like the Title, Access Rights and of course the Content to be indexed. Some data specific to the content provider can also be saved.
-
FullTextSearch now sends the fully completed Model/IndexDocument to the search platform for indexing.
-
The search platform will try to index the Model/IndexDocument and update the status of Model/Index.
-
FullTextSearch update the status of the Index in database, destroy the current Model/IndexDocument (to free memory) and will process to the next Model/IndexDocument in the list.
FullTextSearch will return to step #3 until there is no more Model/IndexDocument in the list.
From the cron, or the :live command
By using the PHP API method updateIndexStatus(providerId, documentId, status), a content provider can indicate that a document will need to be re-indexed. For example, the Files Content Provider calls this method when a file is edited.
When the crontab ticks, or if ./occ fulltextsearch:live is running:
-
FullTextSearch checks the database to generate a list of index (Model/Index[]) with a status that indicate that a document needs to be re-indexed. Each Model/Index contains the Id of the content provider, the Id of the document and its Owner.
-
FullTextSearch picks the first Model/Index from the list and sends it to the its original content provider.
-
The content provider creates a Model/IndexDocument based on the Model/Index and fill it with more data, like the Title, Access Rights and of course the Content to be indexed. Some data specific to the content provider can also be saved.
-
FullTextSearch now sends the fully completed Model/IndexDocument to the search platform for indexing.
-
The search platform will try to index the Model/IndexDocument and update the status of Model/Index.
-
FullTextSearch update the status of the Index in database, destroy the current Model/IndexDocument (to free memory) and will process to the next Model/IndexDocument in the list.
FullTextSearch will return to step #2 until there is no more Model/IndexDocument in the list.
If the whole process is initiated from the :live command, FullTextSearch will wait few seconds before returning to Step #1.