Solr With TYPO3 Indexes All Kind Of Records But Does Not Index Pages

- 1 answer

pages records are not indexed in the same way than other records. They represent the single pages of a website which are build from other records. So these pages are indexed accessing the frontend. Every now and then there are instances where the frontend can't be indexed. The pagesrecords can be added to the indexing queue, but all indexing calls result in an error.

What is needed to index pages?

Of course you need a connection to the solr server and a base configuration to activate the solr indexer, but that should work if you can index other records like e.g. news.

You need some typoscript configuration, which should be present if you include the static templates from the extension.:

plugin.tx_solr {
    index {
        queue {
            pages = 1
            pages {
                initialization = ApacheSolrForTypo3\Solr\IndexQueue\Initializer\Page

                // allowed page types (doktype) when indexing records from table "pages"
                allowedPageTypes = 1,7,4

                indexingPriority = 0

                indexer = ApacheSolrForTypo3\Solr\IndexQueue\PageIndexer
                indexer {
                    // add options for the indexer here

                // Only index standard pages and mount points that are not overlayed.
                additionalWhereClause = (doktype = 1 OR doktype=4 OR (doktype=7 AND mount_pid_ol=0)) AND no_search = 0

                //exclude some html parts inside TYPO3SEARCH markers by classname (comma list)
                excludeContentByClass = typo3-search-exclude

                fields {
                    sortSubTitle_stringS = subtitle

But only this does not get the page content in the index.



What else needs to be configured?

The frontend must be available.
Some server configuration does not allow access to the own pages. Make sure the pages can be called.
If the access is not possible with the original domain you might configure a help domain where solr can access the pages. make sure you store the correct domain in the url of the index entry.

The pages need the appropriate marker to mark the relevant content, so that the menus do not spam the index with irrelevant pages:
<!--TYPO3SEARCH_begin--> and <!--TYPO3SEARCH_end-->
without these markers, which could occur multiple times, the whole document is computed.

But there are some further options which stop indexing:
as seen in the question the doctype is also considered, as visibility.
pageshave an option Include in Search [no_search] , which is shown to external search engines, but also is evaluated from solr.

Last there is an option, which solr has adopted from indexed_search, but only for indexing of pages: config.index_enable = 1
without this option you can index records, but all pages throw an error if they are processes in the indexing queue.