|
Oracle® Database Globalization Development Kit Java API Reference 10g Release 2 (10.2) Part No. B14224-02 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--oracle.i18n.lcsd.LCSDetector
The LCSDetector
class contains methods to automatically
detect and recognize language, encoding, or both based on text input.
To use the LCSDetector
class, call the
getInstance()
method to obtain
an instance of the LCSDetector
class. You can specify a
profile by calling the getInstance(profile)
method, or
simply call the getInstance()
method to use the standard profile
depending on the content of the text you plan to sample. Certain profiles
may yield more accurate results. For example, if you are sampling medical
journals, you many want to use a profile that is built using mainly
medical journals. If you are sampling computer related white papers, a
profile built with similar documents improves the accuracy of
the detection. Currently, we only provide one standard profile which is
for general purpose detection.
The detection process begins by calling the detect(byte[])
method.
Statistics are cumulated every time a detect(byte[])
method is called.
When you are ready for the result, call the getResult()
method
to retrieve an LCSDResultSet
instance. To begin a new detection
using the same LCSDetector
instance, call the reset()
method
to remove the cumulated statistics.
LCSDResultSet
Constructor Summary | |
LCSDetector()
Constructor which uses the standard default profile. |
|
LCSDetector(String name)
Constructor which takes a profile name and allows you to choose a profile other than the default. |
Method Summary | |
void |
detect(byte[] input)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
int |
detect(byte[] input,
int offset,
int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
void |
detect(char[] input)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
int |
detect(char[] input,
int offset,
int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
void |
detect(InputStream input)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
int |
detect(InputStream input,
int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
void |
detect(String input)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
int |
detect(String input,
int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. |
oracle.i18n.lcsd.LCSDResultSet |
getResult()
Determines the top ranking language/character set pairs from the cumulated statistical data. |
static boolean |
isCharsetSupported(int charsettype,
String charset)
Check whether the given character set that is equivalent to the Oracle, IANA, or Java Character Set is supported by the detection feature. |
void |
reset()
To reset statistical data for all pairs to 0 . |
void |
setCharacterSetFilter(String charset)
Sets the character set filter if you know the character set of the input data. |
void |
setLanguageFilter(String language)
Sets the language filter if you know the language of the input data. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public LCSDetector()
public LCSDetector(String name)
name
- name of profile to use
IllegalArgumentException
- if an invalid profile name is
specifiedMethod Detail |
public void setCharacterSetFilter(String charset)
charset
- IANA character set name
IllegalArgumentException
- if an invalid character set is
specifiedpublic void setLanguageFilter(String language)
language
- ISO language name.
IllegalArgumentException
- if an invalida language is specifiedpublic void detect(byte[] input)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics.
input
- the bytes to be sampled by the detect
methodpublic int detect(byte[] input, int offset, int length)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics. Only the specified length of bytes is
sampled.
input
- the bytes to be sampled by the detect
methodoffset
- the index of the first byte to samplelength
- the number of bytes to sample
-1
if the end of the array reached
IllegalArgumentException
- call the reset
methodpublic void detect(InputStream input) throws IOException
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics. The entire stream is sampled by
the detect()
method.
input
- InputStream
to be sampled by
the detect
method
IOException
- if error occurs while doing operation on stream
IllegalArgumentException
- call the reset
methodpublic int detect(InputStream input, int length) throws IOException
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics. Only the specified length of bytes will
be sampled.
input
- InputStream
to be sampled
by the detect()
methodlength
- the number of bytes to sample
-1
if the end of the stream is reached
IOException
- if error occurs while doing operation on stream
IllegalArgumentException
- call reset
methodpublic void detect(String input)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics. The entire string is sampled by
the detect()
method.
input
- to be sampled by the detect
methodpublic int detect(String input, int length)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics.
Only the specified length of characters will
be sampled.
input
- a string to be sampled by the detect()
methodlength
- the number of characters to sample
-1
if the end of the string is reached
IllegalArgumentException
- call reset
methodpublic void detect(char[] input)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics. The entire array is sampled by
the detect()
method.
input
- the characters to be sampled by the detect
methodpublic int detect(char[] input, int offset, int length)
detect()
methods are called. Use the reset()
method
to clear the cumulated statistics.
Only the specified length of characters will be sampled.
input
- the char
array to be sampled by the
detect()
methodoffset
- the index of the first character to samplelength
- the number of characters to sample
-1
if the end
of the array reached.
IllegalArgumentException
- call reset
methodpublic oracle.i18n.lcsd.LCSDResultSet getResult()
LCSDResultSet
object which contains the result or
null
if the sampling data is not enough.public static boolean isCharsetSupported(int charsettype, String charset)
See LocaleMapper
for the parameter
ORACLE
, IANA
, or JAVA
.
charsettype
- can be ORACLE
, IANA
, or
JAVA
.charset
- the given character set
true
if the given character set is supported by the
detection feature, or false
if not
IllegalArgumentException
- if an invalid profile is specifiedpublic void reset()
0
.
|
Oracle® Database Globalization Development Kit Java API Reference 10g Release 2 (10.2) Part No. B14224-02 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |