Wednesday, December 31, 2008

Oracle Text Index Types

CONTEXT:
Use this index to build a text retrieval application when your text consists of large coherent documents like MS Word, HTML or plain text.This index type requires CTX_DDL.SYNC_INDEX after DML on base
table.

CTXCAT:
Typically you index small documents or text fragments. Other columns in the base table, such as item names, prices and descriptions can be included in the index to improve mixed query performance. This index automatically updates itself after DML to base table.This index is larger and takes longer to build than a CONTEXT index.The CTXCAT index does not support table and index partitioning, documents services
(highlighting, markup, themes, and gists) or query services (explain, query feedback, and browse words.)

CTXRULE:
Use CTXRULE index to build a document classification or routing application.

Tuesday, December 30, 2008

Steps to use Classification Application with Oracle Text

Create User:
http://sadatdba.blogspot.com/2008/12/steps-to-use-oracle-text.html

Create the Rule Table:
CREATE TABLE queries (query_id NUMBER,query_string VARCHAR2(80));
INSERT INTO queries VALUES (1, 'oracle');
INSERT INTO queries VALUES (2, 'larry or ellison');
INSERT INTO queries VALUES (3, 'oracle and text');
INSERT INTO queries VALUES (4, 'market share');

Create Your CTXRULE Index:
CREATE INDEX queryx ON queries(query_string) INDEXTYPE IS CTXRULE;

Classify with MATCHES:
COLUMN query_string FORMAT a35;
SELECT query_id,query_string FROM queries WHERE MATCHES(query_string, 'Oracle announced that its market share in databases increased over the last year.')>0;
QUERY_ID QUERY_STRING
---------- -----------------------------------
1                      oracle
4                      market share

Step to use Catalog Application with Oracle Text

Create User:
http://sadatdba.blogspot.com/2008/12/steps-to-use-oracle-text.html

Create Table:
CREATE TABLE auction(item_id NUMBER, title VARCHAR2(100), category_id NUMBER,
price NUMBER, bid_close DATE);

INSERT INTO AUCTION VALUES(1, 'NIKON CAMERA', 1, 400, '24-OCT-2002');
INSERT INTO AUCTION VALUES(2, 'OLYMPUS CAMERA', 1, 300, '25-OCT-2002');
INSERT INTO AUCTION VALUES(3, 'PENTAX CAMERA', 1, 200, '26-OCT-2002');
INSERT INTO AUCTION VALUES(4, 'CANON CAMERA', 1, 250, '27-OCT-2002');

Create the Sub-Index to Order by Price:
EXEC CTX_DDL.CREATE.INDEXT_SET('auction_iset');
EXEC CTX_DDL.ADD_INDEX('auction_iset','price'); /* sub-index A*/

Create the CTXCAT Index:
CREATE INDEX auction_titlex ON AUCTION(title) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS ('index set auction_iset');

Querying Your Table with CATSEARCH:
COLUMN title FORMAT a40;
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'price <= 300')>0;

Update Your Table:
You can update your catalog table by adding new rows. When you do so, the CTXCAT index is automatically synchronized to reflect the change.

INSERT INTO AUCTION VALUES(5, 'FUJI CAMERA', 1, 350, '28-OCT-2002');
INSERT INTO AUCTION VALUES(6, 'SONY CAMERA', 1, 310, '28-OCT-2002');
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;

Steps to use Oracle Text

Create a User and grant the previliges:
CREATE USER ora_text IDENTIFIED BY therap;
GRANT RESOURCE, CONNECT, CTXAPP TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_CLS TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_THES TO ora_text;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO ora_text;

Select Query using Oracle Text:
CONNECT ora_text;
CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
INSERT INTO docs VALUES(1, '<html>California is a state in the US.</html>');
INSERT INTO docs VALUES(2, '<html>Paris is a city in France.</html>');
INSERT INTO docs VALUES(3, '<html>France is in Europe.</html>');

Create the Index.

CREATE INDEX idx_docs ON docs(text)
INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');

Select the document.
COLUMN text FORMAT a40;
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'France', 1) > 0;

Highlight the result.
This PL/SQL example uses the in-memory version of CTX_DOC.MARKUP to highlight the word France in document 3.
set serverout on
DECLARE
mklob CLOB;
amt NUMBER := 40;
line VARCHAR2(80);
BEGIN
CTX_DOC.MARKUP('idx_docs','3','France', mklob);
DBMS_LOB.READ(mklob, amt, 1, line);
DBMS_OUTPUT.PUT_LINE('FIRST 40 CHARS ARE:'||line);
DBMS_LOB.FREETEMPORARY(mklob);
END;
/

Synchronize the Index After Data Manipulation:
INSERT INTO docs VALUES(4, '<html>Los Angeles is a city in California.</html>');
INSERT INTO docs VALUES(5, '<html>Mexico City is big.</html>');
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;

It will show the previous result.Therefore, synchronize the index with 2Mb of memory, and rerun the query.
EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;

Oracle Text

Oracle Text allows text searches to be combined with regular database searches in a single SQL statement. It can find documents based on their textual content, metadata, or attributes. The Oracle Text SQL API makes it simple and intuitive to create and maintain Text indexes and run Text searches.


Oracle Text Index Types:

There are three Text index types to cover all text search needs.
* Standard index type for traditional full-text retrieval over documents and Web pages. The context index type provides a rich set of text search capabilities for finding the content you need, without returning pages of spurious results.
* Catalog index type, designed specifically for e-Business catalogs. This catalog index provides flexible searching and sorting at Web-speed.
* Classification index type for building classification or routing applications. This index is created on a table of queries, where the queries define the classification or routing criteria.

search engine

Custom Search