-------------------------------------热心人回复的分隔 线------------------------------------- Andi Vajda: To access your class(es) by name from Python, you must have JCC generate wrappers for it (them). This is what is done line 177 and on in PyLucene's Makefile. The easiest way for you to add your own Java classes to PyLucene is to create another jar file with your own analyzer classes and code and add it to the JCC invocation there.
For example, the Makefile snippet in question currently says:
and rebuild PyLucene. That should be all you need to do. Your jar file is going to be installed along with lucene's in the lucene egg and it is going to be put on lucene.CLASSPATH which you use with lucene.initVM().
Your classes can be declared in any Java package you want. Just make sure that their names don't clash with other Lucene class names that you also need to use as the class namespace is flattened in PyLucene.
For more information about JCC and its command line args see JCC's README file at [1].
Technically, the PyLucene programmer is not providing an ‘extension’
but a Python implementation of a set of methods encapsulated by a
Python class whose instances are wrapped by the Java proxies provided
by PyLucene.
For example, the code below, extracted from a PyLucene unit test,
defines a custom analyzer using a custom token stream that returns the
tokens ‘1′, ‘2′, ‘3′, ‘4′, ‘5′ for any document it is called on.
All that is needed in order to provide a custom analyzer in Python is
defining a class that implements a method called ‘tokenStream’. The
presence of the ‘tokenStream’ method is detected by the corresponding
SWIG type handler and the python instance passed in is wrapped by a new
Java PythonAnalyzer instance that extends Lucene’s abstract Analyzer
class.
In other words, SWIG in reverse.
class _analyzer(object):
def tokenStream(self, fieldName, reader):
class _tokenStream(object):
def __init__(self):
self.tokens = ['1', '2', '3', '4', '5']
self.increments = [1, 2, 1, 0, 1]
self.i = 0
def next(self):
if self.i == len(self.tokens):
return None
t = Token(self.tokens[self.i], self.i, self.i)
t.setPositionIncrement(self.increments[self.i])
self.i += 1
return t
return _tokenStream()
analyzer = _analyzer()
store = RAMDirectory()
writer = IndexWriter(store, analyzer, True)
d = Document()
d.add(Field.Text(”field”, “bogus”))
writer.addDocument(d)
writer.optimize()
writer.close()
0 条评论:
发表评论
<< 主页