Alfresco Solr Trackers Showcase

In 2014, while working at Alfresco, I helped upgrade from using Solr 1.4 to Solr 4.9, and in doing so I changed much of the Solr tracking code.  We were on Alfresco version 4.2, and our next big release would be 5.0.  Here’s an overview of my work.

We started with one solr project.  In ACE-916 and other issues we separated the code into two additional projects, solr-client, and solr4.

The solr-client Project

The solr-client project contains our Java API to connect to Alfresco for tracking.  Despite the name of this project, it really provides a proxy to Alfresco rather than to Solr.  I moved code from the original solr project that was relevant for all Solr versions to this project.  I created the SOLRAPIClientFactory and its associated SOLRAPIClientFactoryTest.  I created these adapter interfaces in order to keep some dependent code in the project free from dependencies on specific versions of those classes.  The solr and solr4 projects are each dependent upon solr-client but independent of each other.

The solr Project

The solr project contains code relevant to Solr 1.4.  I created the specific implementations of the adapters.  The original CoreTracker had > 2000 lines of code and did all kinds of tracking sequentially.  This was refactored in the solr4 project.  I also created the InformationServer interface and the LegacySolrInformationServer.

The solr4 Project

The solr4 project is specific to Solr 4.9.  I created this project, and the majority of my work was done here.  I created the ModelTracker, ContentTracker, and MetadataTracker, and I contributed to the AbstractTracker and AclTracker.  The now multiple kinds of trackers used the ThreadHandler, QueueHandler, and AbstractWorkableRunner to take advantage of multi-threading.  The ContentTracker took advantage of our new SolrContentStore that we used as a cache to prevent having to hit Alfresco for a reindex.  Associated tracker tests are here.

I also changed how we triggered tracking.  Alfresco has a separate Solr core for each Alfresco store, i.e. workspace://SpacesStore for “live” content, archive://SpacesStore for “deleted” content, etc. The AlfrescoCoreAdminHandler, which is a custom CoreAdminHandler, instantiates a SolrTrackerScheduler which schedules a CoreWatcherJob. The CoreWatcherJob goes through the Solr cores and registers with the admin handler the information server and the trackers. To do this I created a TrackerRegistry to register trackers per core.  Here are the SolrTrackerSchedulerTest and the TrackerRegistryTest.  

As was required by the new SolrCore, I created an AlfrescoSolrCloseHook along with its AlfrescoSolrCloseHookTest.  I created the InformationServer interface and the SolrInformationServer implementation for solr4.  * I wanted to have one InformationServer interface in the solr-client project, but the implementations were so different that it didn’t fit.  Now I almost feel like there is no point in having the interface for both projects, but I left them in there anyway.

I implemented the adapters mentioned above here.  Since we were trying to make our new implementation of Solr cloud-friendly, I implemented Cloud to facilitate running solr queries in the cloud.  Along with that were the SolrCoreTestBase and the CloudTest.

For ACE-3126 I ensured that module models are gotten before queries go through during installation.  I created the EnsureModelsComponent that makes queries block wait until the first model sync is done to the repository.

Testing

I prefer when writing unit tests to use a mocking framework so that the tests have no external dependencies such as a database or an app server.  That’s why I invested time in blazing a trail for using Mockito at Alfresco.  Of course we also performed integration tests like those in the AlfrescoCoreAdminTester and manual tests as documented in various Jira issues.  I didn’t participate in the performance tests, but I know they were done.

Mavenization

The code was originally being built using Ant, and I mavenized the build.  This included a solr4-ssl profile to enable secure comms in the solr4 pom and a solr-http profile to disable ssl in the solr pom.

Conclusion

I learned about Solr, multi-threading, scheduling jobs in Alfresco, Ant, Maven, and generally how to populate the Solr index with all relevant information in an enterprise content management system like Alfresco.  I have come to know that this problem is something that others like Lucidworks have solved.  I have developed a passion for this area and would like to do more on it in the future.

 

Advertisements

3 thoughts on “Alfresco Solr Trackers Showcase

  1. I recently started looking into Alfresco Solr integration. This is truly one of the very few insightful posts. Alfresco official documentation hardly touches these details.

    A quick question – how do you setup your environment for alfresco-solr4 development? For example, what if you want to set some break points in a tracker or atfs handler Java class to debug and step through in Eclipse? You cannot really do that with Alfresco SDK generated projects, not even the all-in-one type. With the mavenization, is it a matter of checking out “alfresco-solr4” and run “mvn clean install -Prun”? But I am sure there are some additional configurations you need to take care…

    Thanks!

    Like

  2. I created a SDK repo project and checked out alfresco-solr4 project from source (community 5.0.d). They were configured for Alfresco and Solr to talk to each other (port 8080/8083). I ran “alfresco-solr4” using the non-SSL profile.

    Search from Alfresco (SolrQueryHTTPClient) can reach Solr (AlfrescoSearchHandler). However Solr trackers could not talk to Alfresco. The log has errors like

    ERROR [solr.tracker.AbstractTracker] [SolrTrackerScheduler_Worker-1] Model tracking failed
    org.alfresco.error.AlfrescoRuntimeException: 04090000 GetModelsDiff return status is 302

    The interesting thing is that log also has the following errors. Since it was started with non-SSL profile, why are we still seeing SSL related errors?

    org.apache.coyote.AbstractProtocol start
    INFO: Starting ProtocolHandler [“http-bio-8446”]
    ERROR [solr.tracker.AbstractTracker] [SolrTrackerScheduler_Worker-1] Tracking failed
    javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s