Owl_nlp_tfidfSourceNLP: TFIDF module
Type of a TFIDF model
``term_freq term_count num_words`` calculates the term frequency weight.
``doc_freq doc_count num_docs`` calculates the document frequency weight.
Return the corpus contained in TFIDF model
Get the file handle associated with TFIDF model.
``doc_count_of tfidf w`` calculate document frequency for a given word ``w``.
``doc_count vocab fname`` count occurrency in all documents contained in the raw text corpus of file ``fname``, for all words
``term_count count doc`` counts the term occurrency in a document, and saves the result in count hashtbl.
val doc_to_vec :
(float, 'a) Bigarray.kind ->
t ->
(int * float) array ->
(float, 'a) Owl_dense.Ndarray.Generic.t``doc_to_vec kind tfidf vec`` converts a TFIDF vector from its sparse represents to dense ndarray vector whose length equals the vocabulary size.
Return the ith TFIDF vector in the model. The format of return is ``(vocabulary index, weight)`` tuple array of a document.
Return the next document vector in the model. The format of return is ``(vocabulary index, weight)`` tuple array of a document.
Return the next batch of document vectors in the model, the default size is 100.
Iterate all the document vectors in a TFIDF model. The format of document vector is ``(vocabulary index, weight)`` tuple array of a document.
Map all the document vectors in a TFIDF model. The format of document vector is ``(vocabulary index, weight)`` tuple array of a document.
This function builds up a TFIDF model according to the passed in paramaters.
Parameters: * ``norm``: whether to normalise the vectors in the TFIDF model, default is ``false``. * ``sort``: whether to sort the terms in a TFIDF vector in increasing order w.r.t their vocabulary indices. The default is ``false``. * ``tf``: type of term frequency used in building TFIDF. The default is ``Count``. * ``df``: type of document frequency used in building TFIDF. The default is ``Idf``. * ``corpus``: the corpus built by ``Owl_nlp_corpus`` model atop of which TFIDF will be built.
``save tfidf fname`` saves the TFIDF to a file of given file name ``fname``.
Convert a TFIDF to its string representation, contains summary information.
Convert a single document according to a given model
``normalise x`` makes ``x`` a unit vector by dividing its l2norm.
Wrap up a TFIDF model type. Low-level function and you are not supposed to use it.
val all_pairwise_distance :
Owl_nlp_similarity.t ->
t ->
('a * float) array ->
(int * float) arrayCalculate pairwise distance for the whole model, return format is ``(id,dist)`` array.
val nearest :
?typ:Owl_nlp_similarity.t ->
t ->
('a * float) array ->
int ->
(int * float) arrayReturn K-nearest neighbours, it is very slow due to linear search.