2 posts tagged “solr”
I wanted a multi tenancy database with no prior knowledge of data. It's like what database I should use to develop a SalesForce like infrastrcuture. Finally, one thing which came closure to my consideration was HBase. I liked the way they are going and the Amazon platform is best suitable for this. The HBase as mentioned is perfect for stortage and retrieving the information on a primary key. But what about the searching piece. In the onelineweb we put the following mechanism:
Assumptions
1 - Data reading is 5 times than data creation.
2 - Active workflow records only change
Problem Faced
Each node maintains their local Solr instance. All solr indexes in the active nodes are alike. Updating the solr indexes need to be incremental in nature. The information can change in any node.
The Solution
I created a sync table. When ever a record change, I put an entry to this sync table. Each machine maintans the last sync id. I get all the latest content from this last sync id and update the solr instnace.
The Solr index can go large. To distribute the solr index in many nodes, we need to have a platform to assemble it from all these places and serve it. The architecture is to have a common set of actively changing information and other nodef having frozen records which don't change often. We use the map reduce paradigm for managing the distributed solr index.
I once heard in some countries people cry when a new baby is born... Because the death is counted. Whatever has come to alive in this universe is going to die oneday. The information is no exception. For some time, it gets transacted, people access it more often and then it becomes dead. People open it from the coffin for investigation.
I am talking about this live and dead state of information. A transactional relational database is good for a live information storage. Why should we have the dead information there with just a flag off saying inactive or what ever you use in your application. I use states like "closed, settled, withdrawn, rejected, ...". But all stay together. Imaging in real life the dead and lives are staying in same house just a name board in dead rooms saying Late Mr. XXX
I can't. A horrible experience. Then why? This was my inspiration to architect onedata. Here information flows from the live storage to dead storage. From relational database to search engine. When we search, it searches both of these places, unifies the information and present in a one data fashion. The live storage enables frequent transactions.
The advantage
- The database storage is super lite.
- Search engine is good at finding information given a clue. Once a request goes to storage, people only remember a clue to find it again. So retrieval alligns with human nature.
- Archieval becomes easy. The DNA of databases to handle loads of information.
The Solution
I have implemented this for my product using solr and mysql. I will publish it soon in sourceforge. I have requested a project named oneline for this. If you are interested let me know. I will inform you once I release the alpha files. It spits an XML result for the clients. Now-a-days RIA flex, xslt technologies are very good in processing the XML results. In additon to this carrot2 kind of clustering on information, gives a buzz to the end user for quick information finding. Don't forget to write a comment. If you liked it, just express yourself. Your view points, one comment will keep us writting. A snippet of it: