Knowledge base

The class KnowledgeBase is the main access point to all resources described in the knowledge base (e.g. HucitAuthor, HucitWork, etc.). Its methods can be divided into the following high-level groups:

class hucitlib.KnowledgeBase(config_file: str = None)

KnowledgeBase is a class that allows for accessing a HuCit knowledge base in an object-oriented fashion. The abstraction layer it provides means that you can use, search and modify its content without having to worry about the underlying modelling of data in RDF.

Parameters:config_file (str) – Path to the configuration file containing the parameters to connect to the triple store whose data will be accessible via the KnowledgeBase object.
Returns:Description of returned object.
Return type:None

Note

By default (i.e. when no configuration file is specified) a new KnowledgeBase instance will be created that reads data directly from the triple store hosted at Druid. NB: please note that all methods that modify entries in the KB won’t work as that triple store is read-only.

Example of usage:

>>> from hucit_kb import KnowledgeBase
>>> kb = KnowledgeBase()
>>> homer = kb.get_resource_by_urn('urn:cts:greekLit:tlg0012')
>>> print(homer.rdfs_label.one)
add_textelement_type(label: str, lang: str = 'en') → Optional[surf.resource.Resource]

Adds a new TextElementType to the Knowledge base if not yet present.

Parameters:
  • label (str) – Description of parameter label.
  • lang (str) – Description of parameter lang.
Returns:

Description of returned object.

Return type:

Optional[surf.resource.Resource]

# this will work only when connecting to a triples store
# where you have access in writing mode
>>> from hucit_kb import KnowledgeBase
>>> kb = KnowledgeBase()
>>> element_type_obj = kb.add_textelement_type("book")
add_textelement_types(types: List[str]) → None

Adds the text element type in case it doesn’t exist.

Parameters:types (List[str]) – a list of strings (e.g. [“book”, “poem”, “line”])
Returns:Description of returned object.
Return type:None
# this will work only when connecting to a triples store
# where you have access in writing mode
>>> from hucit_kb import KnowledgeBase
>>> kb = KnowledgeBase()
>>> kb.add_textelement_types(["book", "line"])
author_names

Returns a dictionary like this:

{
    "urn:cts:greekLit:tlg0012$$n1" : "Homer"
    , "urn:cts:greekLit:tlg0012$$n2" : "Omero"
    , ...
}
create_cts_urn(resource: surf.resource.Resource, urn_string: str) → Optional[surf.resource.Resource]

Creates a CTS URN object and assigns it to a given resource.

Parameters:
  • resource (surf.resource.Resource) – KB entry to be identified by the CTS URN.
  • urn_string (str) – CTS URN identifier (e.g. urn:cts:greekLit:tlg0012)
Returns:

The newly created object or None if it already existed.

Return type:

Optional[surf.resource.Resource]

create_text_element(work: surf.resource.Resource, urn_string: str, element_type: surf.resource.Resource, source_uri: str = None)

Short summary.

Parameters:
  • urn (str) – Text element’s URN.
  • element_type (surf.resource.Resource) – Text element type.
Returns:

The newly created text element.

Return type:

type

>>> iliad = kb.get_resource_by_urn("urn:cts:greekLit:tlg0012.tlg001")
>>> etype_book = kb.get_textelement_type("book")
>>> ts = iliad.structure
>>> ts.create_element(
    "urn:cts:greekLit:tlg0012.tlg001:1",
    element_type=type_book,
    following_urn="urn:cts:greekLit:tlg0012.tlg001:2"
)
get_author_label(urn)

Get the label corresponding to the author identified by the CTS URN.

try to get an lang=en label (if multiple labels in this lang pick the shortest) try to get a lang=la label (if multiple labels in this lang exist pick the shortest) try to get a lang=None label (if multiple labels in this lang exist pick the shortest)

returns None if no name is found

get_authors() → List[hucitlib.surfext.HucitAuthor]

Lists all authors contained in the knowledge base.

Returns:A list of authors.
Return type:List[HucitAuthor]
get_opus_maximum_of(author_cts_urn)

Return the author’s opux maximum (None otherwise).

Given the CTS URN of an author, this method returns its opus maximum. If not available returns None.

Parameters:author_cts_urn – the author’s CTS URN.
Returns:an instance of surfext.HucitWork or None
get_resource_by_urn(urn)

Fetch the resource corresponding to the input CTS URN.

Currently supports only HucitAuthor and HucitWork.

Parameters:urn – the CTS URN of the resource to fetch
Returns:either an instance of HucitAuthor or of HucitWork
get_statistics() → Dict[str, int]

Gather basic stats about the Knowledge Base and its contents.

Note

This method currently has some performances issues.

Returns:a dictionary
get_textelement_type(label: str) → Optional[surf.resource.Resource]

Returns a TextElementType (instance of E55_Type) if present.

Note

label (lowercased) is used to create the URI (http://purl.org/hucit/kb/types/{label}).

Parameters:label (str) – Description of parameter label.
Returns:Description of returned object.
Return type:surf.resource.Resource
get_textelement_types() → List[surf.resource.Resource]

Returns all TextElementTypes defined in the knowledge base.

Returns:Description of returned object.
Return type:List[surf.resource.Resource]
get_work_label(urn)

Get the label corresponding to the work identified by the input CTS URN.

try to get an lang=en label try to get a lang=la label try to get a lang=None label

returns None if no title is found

get_works()

Return the author’s works.

Returns:a list of HucitWork instances.
search(search_string: str) → List[Tuple[str, surf.resource.Resource]]

Searches for a given string through the resources’ labels.

Parameters:search_string (str) – Description of parameter search_string.
Returns:Description of returned object.
Return type:List[Tuple[str, Resource]]
to_json()

Serialises the content of the KnowledgeBase as JSON.

Returns:TODO