pyjedai.utils#
Functions
|
Retrieves features and their values from the given workflow dictionary, |
|
id1 and id2 consist a matching pair if: - Blocks: intersection > 0 (comparison of sets) - Clusters: cluster-id-j == cluster-id-i (comparison of integers) |
|
Generator function that breaks an iterable into batches of a set size. |
|
Checks for one entity blocks. |
|
Returns the identifiers in canonical order |
|
Chi Square Method |
|
|
|
Returns the union of the elements of both lists in the order they appear in the first list |
|
Cosine similarity between two vectors |
|
Creates a dict of entity ids to block keys . |
|
Drops blocks if: |
|
Removes one-size blocks for DER and empty for CCER |
|
Returns unique identifier which is used to cross reference workflows stored in json file and their performance graphs |
|
Returns the cardinality of the blocks. |
|
Returns a list of argument names for requested function of the given class :param class_reference: Reference to a class :param function_name: Name of the requested function :type function_name: str |
|
Returns a list of multiples of the requested number up to n * number |
|
|
|
Returns the q-gram value from the tokenizer name. |
|
|
|
Sorts blocks in alphabetical order based on their token, shuffles the entities of each block, concatenates the result in a list |
|
|
|
|
|
|
|
Checks if given workflow's arguments that are shared with the target arguments have values that appear in the those arguments |
|
Configuration file contains values for source, target and ground truth dataframes |
|
Returns a subset of the given dictionary including only the given keys. |
|
Prints all the contents of the block index. |
|
Prints candidate pairs index in natural language. |
|
Prints clusters contents. |
|
|
|
Reads dataset details from a JSON file and returns a Data object. |
|
Takes a workflow dictionary or retrieves it from given path. |
|
Returns a new instance of blocks containing the entity IDs of the given blocks translated into the reverse indexing system :param blocks: blocks as defined in the previous indexing :type blocks: dict :param data: Previous data module used to define the reversed ids based on previous dataset limit and dataset sizes :type data: Data |
|
Returns a new data model based upon the given data model with reversed indexing of the datasets :param data: input dat a model :type data: Data |
|
|
|
|
|
|
|
Lower clean. |
|
|
|
Based on its performance, sets the new workflow as the top one in |
|
Values for requested parameters have been supplied by the user in the configuration file |
|
Takes a workflow dictionary or retrieves it from given path. |
Classes
|
Stores a dictionary [Entity -> Entity's Neighborhood Data (Whoosh Neighborhood)] |
|
Stores information about the neighborhood of a given entity ID: - ID : The identifier of the entity as it is defined within the original dataframe - Total Weight : The total weight of entity's neighbors - Number of Neighbors : The total number of Neighbors - Neighbors : Entity's neighbors sorted in descending order of weight - Stage : Insert / Pop stage (entities stored in ascending / descending weight order) |
|
|
|
For each entity identifier stores a list of index it appears in, within the list of shuffled entities of sorted blocks |
|
Auxiliarry module used to store basic information about the to-emit, predicted pairs |
|
Stores the indices of retained entities of the initial datasets, calculates and stores the mapping of element indices from new to old dataset (id in subset -> id in original) |
|
|
|