Monday 5 January 2015

Import Oracle Data, Configuration, Create Index and Query

These are steps for creation of index using Solr.
Required: install solr in local system or on a server and check the configurations of Java.
Step 1: Denormalize data into one table and create necessary indexes for faster import
Step 2: Make sure Oracle JDBC driver (JAR file) is present in your system
Step 3: Edit solrconfig.xml and add the below line, mention proper jar file name for the driver

<lib dir="../../../dist/" regex="ojdbc14-1.0.jar" />

Step 4: Add below line as well in solrconfig.xml if missing

<lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />

Step 5: A request handler needs to be added as well in solrconfig.xml, mention correct file name for db data config xml

     <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
       <str name="config">db-data-config.xml</str>
      </lst>
    </requestHandler>

Step 6: Edit the db-data-config.xml file with the Oracle DB instance details like host, port and user/password

Step 7: give an entity name for the query on the table from which data will be imported
Step 8: Mention the SQL query that will be used for import, it is convenient to mention actual column names instead of mentioning * in the SQL query so only the required columns will be indexed and names will be there for reference to use while defining fields for search
Step 9: while defining fields, column will be the actual column name in the table and name will be the alias used later while indexing and querying

Here is the sample db-data-config.xml 

<dataConfig>
    <dataSource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:apps/secretpwd@//hostname:1520/service name"/>
    <document>
        <entity name="item" query="select INVENTORY_ITEM_ID,ITEM_NAME,PRODUCT_FAMILY,BUSINESS_GROUP from XXITM_ITEM_DATA_STAGE where icc='License'">
            <field column="INVENTORY_ITEM_ID" name="id" />
             <field column="ITEM_NAME" name="itemname" />
            <field column="PRODUCT_FAMILY" name="pf" />
            <field column="BUSINESS_GROUP" name="bg" />
        </entity>
    </document>
</dataConfig>

While mentioning the URL, if the Oracle DB is RAC then you need to use the below format

<dataSource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:apps/secretpwd@//hostname:1520/service name"/>

and if SID needs to be mentioned then use the below format

<dataSource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@hostname:1520:sid" user="apps"  password="secretpwd"/>

Step 10: Add the field names that was added in db-data-config.xml as well in schema.xml

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="itemname" type="string" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="pf" type="string" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="bg" type="string" indexed="true" stored="true" required="true" multiValued="false" />

Step 11: Place all the files in the appropriate conf folder under collection folder

\solr-4.10.2\example\solr\collection1\conf

Step 12: Open Solr Console and click on Dataimport link.
Step 13: Select full-import command and select the entity name, entity name is the one that is specified in db-data-config.xml for the query that is used for import
Step 14: Click on execute and refresh status to know the status of indexing
Step 15: Once the index is created, click on query to verify the index
Step 16: name of the field can be used to query a specific data

Step 17: for incremental sync of index db-data-config.xml needs to be modified to include delta import query
Step 18: Java routines may be written as well for delta import