Segmentation

Routines for image segmentation.

`SegmentationEM`

Bases: BaseImageSegmentationAggregator

The Segmentation EM-algorithm performs a categorical aggregation task for each pixel: should it be included in the resulting aggregate or not. This task is solved by the single-coin Dawid-Skene algorithm. Each worker has a latent parameter skill that shows the probability of this worker to answer correctly.

Skills and true pixel labels are optimized by the Expectation-Maximization algorithm: 1. E-step. Estimates the posterior probabilities using the specified workers' segmentations, the prior probabilities for each pixel, and the workers' error probability vector. 2. M-step. Estimates the probability of a worker answering correctly using the specified workers' segmentations and the posterior probabilities for each pixel.

D. Jung-Lin Lee, A. Das Sarma and A. Parameswaran. Aggregating Crowdsourced Image Segmentations. CEUR Workshop Proceedings. Vol. 2173, (2018), 1-44.

https://ceur-ws.org/Vol-2173/paper10.pdf

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> from crowdkit.aggregation import SegmentationEM
>>> df = pd.DataFrame(
>>>     [
>>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
>>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
>>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
>>>     ],
>>>     columns=['task', 'worker', 'segmentation']
>>> )
>>> result = SegmentationEM().fit_predict(df)

Source code in crowdkit/aggregation/image_segmentation/segmentation_em.py

@attr.s
class SegmentationEM(BaseImageSegmentationAggregator):
    r"""The **Segmentation EM-algorithm** performs a categorical
    aggregation task for each pixel: should it be included in the resulting aggregate or not.
    This task is solved by the single-coin Dawid-Skene algorithm.
    Each worker has a latent parameter `skill` that shows the probability of this worker to answer correctly.

    Skills and true pixel labels are optimized by the Expectation-Maximization algorithm:
    1. **E-step**. Estimates the posterior probabilities using the specified workers' segmentations, the prior probabilities for each pixel,
    and the workers' error probability vector.
    2. **M-step**. Estimates the probability of a worker answering correctly using the specified workers' segmentations and the posterior probabilities for each pixel.


    D. Jung-Lin Lee, A. Das Sarma and A. Parameswaran. Aggregating Crowdsourced Image Segmentations.
    *CEUR Workshop Proceedings. Vol. 2173*, (2018), 1-44.

    <https://ceur-ws.org/Vol-2173/paper10.pdf>

    Examples:
        >>> import numpy as np
        >>> import pandas as pd
        >>> from crowdkit.aggregation import SegmentationEM
        >>> df = pd.DataFrame(
        >>>     [
        >>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
        >>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
        >>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
        >>>     ],
        >>>     columns=['task', 'worker', 'segmentation']
        >>> )
        >>> result = SegmentationEM().fit_predict(df)
    """

    n_iter: int = attr.ib(default=10)
    """The maximum number of EM iterations."""

    tol: float = attr.ib(default=1e-5)
    """The tolerance stopping criterion for iterative methods with a variable number of steps.
    The algorithm converges when the loss change is less than the `tol` parameter."""

    eps: float = 1e-15
    """The convergence threshold."""

    segmentation_region_size_: int = attr.ib(init=False)
    """Segmentation region size."""

    segmentations_sizes_: npt.NDArray[Any] = attr.ib(init=False)
    """Sizes of image segmentations."""

    priors_: Union[float, npt.NDArray[Any]] = attr.ib(init=False)
    """The prior probabilities for each pixel to be included in the resulting aggregate.
    Each probability is in the range from 0 to 1, all probabilities must sum up to 1."""

    posteriors_: npt.NDArray[Any] = attr.ib(init=False)
    """The posterior probabilities for each pixel to be included in the resulting aggregate.
    Each probability is in the range from 0 to 1, all probabilities must sum up to 1."""

    errors_: npt.NDArray[Any] = attr.ib(init=False)
    """The workers' error probability vector."""

    loss_history_: List[float] = attr.ib(init=False)
    """A list of loss values during training."""

    @staticmethod
    def _e_step(
        segmentations: npt.NDArray[Any],
        errors: npt.NDArray[Any],
        priors: Union[float, npt.NDArray[Any]],
    ) -> npt.NDArray[Any]:
        """
        Performs E-step of the algorithm.
        Estimates the posterior probabilities using the specified workers' segmentations, the prior probabilities for each pixel,
        and the workers' error probability vector.
        """

        weighted_seg = (
            np.multiply(errors, segmentations.T.astype(float)).T
            + np.multiply((1 - errors), (1 - segmentations).T.astype(float)).T
        )

        with np.errstate(divide="ignore"):
            pos_log_prob = np.log(priors) + np.log(weighted_seg).sum(axis=0)
            neg_log_prob = np.log(1 - priors) + np.log(1 - weighted_seg).sum(axis=0)

            with np.errstate(invalid="ignore"):
                # division by the denominator in the Bayes formula
                posteriors: npt.NDArray[Any] = np.nan_to_num(
                    np.exp(pos_log_prob)
                    / (np.exp(pos_log_prob) + np.exp(neg_log_prob)),
                    nan=0,
                )

        return posteriors

    @staticmethod
    def _m_step(
        segmentations: npt.NDArray[Any],
        posteriors: npt.NDArray[Any],
        segmentation_region_size: int,
        segmentations_sizes: npt.NDArray[Any],
    ) -> npt.NDArray[Any]:
        """
        Performs M-step of the algorithm.
        Estimates the probability of a worker answering correctly using the specified workers' segmentations and the posterior probabilities for each pixel.
        """

        mean_errors_expectation: npt.NDArray[Any] = (
            segmentations_sizes
            + posteriors.sum()
            - 2 * (segmentations * posteriors).sum(axis=(1, 2))
        ) / segmentation_region_size

        # return probability of worker marking pixel correctly
        return 1 - mean_errors_expectation

    def _evidence_lower_bound(
        self,
        segmentations: npt.NDArray[Any],
        priors: Union[float, npt.NDArray[Any]],
        posteriors: npt.NDArray[Any],
        errors: npt.NDArray[Any],
    ) -> float:
        weighted_seg = (
            np.multiply(errors, segmentations.T.astype(float)).T
            + np.multiply((1 - errors), (1 - segmentations).T.astype(float)).T
        )

        # we handle log(0) * 0 == 0 case with nan_to_num so warnings are irrelevant here
        with np.errstate(divide="ignore", invalid="ignore"):
            log_likelihood_expectation: float = (
                np.nan_to_num(
                    (np.log(weighted_seg) + np.log(priors)[None, ...]) * posteriors,
                    nan=0,
                ).sum()
                + np.nan_to_num(
                    (np.log(1 - weighted_seg) + np.log(1 - priors)[None, ...])
                    * (1 - posteriors),
                    nan=0,
                ).sum()
            )

            return log_likelihood_expectation - float(
                np.nan_to_num(np.log(posteriors) * posteriors, nan=0).sum()
            )

    def _aggregate_one(self, segmentations: "pd.Series[Any]") -> npt.NDArray[np.bool_]:
        """
        Performs the Expectation-Maximization algorithm for a single image.
        """
        priors = sum(segmentations) / len(segmentations)
        segmentations_np: npt.NDArray[Any] = np.stack(segmentations.values)  # type: ignore
        segmentation_region_size = segmentations_np.any(axis=0).sum()

        if segmentation_region_size == 0:
            return np.zeros_like(segmentations_np[0])

        segmentations_sizes = segmentations_np.sum(axis=(1, 2))
        # initialize with errors assuming that ground truth segmentation is majority vote
        errors = self._m_step(
            segmentations_np,
            np.round(priors),
            segmentation_region_size,
            segmentations_sizes,
        )
        loss = -np.inf
        self.loss_history_ = []
        for _ in range(self.n_iter):
            posteriors = self._e_step(segmentations_np, errors, priors)
            posteriors[posteriors < self.eps] = 0
            errors = self._m_step(
                segmentations_np,
                posteriors,
                segmentation_region_size,
                segmentations_sizes,
            )
            new_loss = self._evidence_lower_bound(
                segmentations_np, priors, posteriors, errors
            ) / (len(segmentations_np) * segmentations_np[0].size)
            priors = posteriors
            self.loss_history_.append(new_loss)
            if new_loss - loss < self.tol:
                break
            loss = new_loss

        return cast(npt.NDArray[np.bool_], priors > 0.5)

    def fit(self, data: pd.DataFrame) -> "SegmentationEM":
        """Fits the model to the training data with the EM algorithm.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        Returns:
            SegmentationEM: self.
        """

        data = data[["task", "worker", "segmentation"]]

        self.segmentations_ = data.groupby("task").segmentation.apply(
            lambda segmentations: self._aggregate_one(
                segmentations
            )  # using lambda for python 3.7 compatibility
        )
        return self

    def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
        """Fits the model to the training data and returns the aggregated segmentations.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        Returns:
            Series: Task segmentations. The `pandas.Series` data is indexed by `task` so that `segmentations.loc[task]` is the task aggregated segmentation.
        """

        return self.fit(data).segmentations_

`eps: float = 1e-15` `class-attribute` `instance-attribute`

The convergence threshold.

`errors_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

The workers' error probability vector.

`loss_history_: List[float] = attr.ib(init=False)` `class-attribute` `instance-attribute`

A list of loss values during training.

`n_iter: int = attr.ib(default=10)` `class-attribute` `instance-attribute`

The maximum number of EM iterations.

`posteriors_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

The posterior probabilities for each pixel to be included in the resulting aggregate. Each probability is in the range from 0 to 1, all probabilities must sum up to 1.

`priors_: Union[float, npt.NDArray[Any]] = attr.ib(init=False)` `class-attribute` `instance-attribute`

The prior probabilities for each pixel to be included in the resulting aggregate. Each probability is in the range from 0 to 1, all probabilities must sum up to 1.

`segmentation_region_size_: int = attr.ib(init=False)` `class-attribute` `instance-attribute`

Segmentation region size.

`segmentations_sizes_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

Sizes of image segmentations.

`tol: float = attr.ib(default=1e-05)` `class-attribute` `instance-attribute`

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

`fit(data)`

Fits the model to the training data with the EM algorithm.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required

Returns:

Name	Type	Description
`SegmentationEM`	`SegmentationEM`	self.

Source code in crowdkit/aggregation/image_segmentation/segmentation_em.py

def fit(self, data: pd.DataFrame) -> "SegmentationEM":
    """Fits the model to the training data with the EM algorithm.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

    Returns:
        SegmentationEM: self.
    """

    data = data[["task", "worker", "segmentation"]]

    self.segmentations_ = data.groupby("task").segmentation.apply(
        lambda segmentations: self._aggregate_one(
            segmentations
        )  # using lambda for python 3.7 compatibility
    )
    return self

`fit_predict(data)`

Fits the model to the training data and returns the aggregated segmentations.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required

Returns:

Name	Type	Description
`Series`	`Series[Any]`	Task segmentations. The `pandas.Series` data is indexed by `task` so that `segmentations.loc[task]` is the task aggregated segmentation.

Source code in crowdkit/aggregation/image_segmentation/segmentation_em.py

def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
    """Fits the model to the training data and returns the aggregated segmentations.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

    Returns:
        Series: Task segmentations. The `pandas.Series` data is indexed by `task` so that `segmentations.loc[task]` is the task aggregated segmentation.
    """

    return self.fit(data).segmentations_

`SegmentationMajorityVote`

Bases: BaseImageSegmentationAggregator

The Segmentation Majority Vote algorithm chooses a pixel if and only if the pixel has "yes" votes from at least half of all workers.

This method implements a straightforward approach to the image segmentation aggregation: it assumes that if a pixel is not inside the worker's segmentation, this vote is considered to be equal to 0. Otherwise, it is equal to 1. Then the SegmentationEM algorithm aggregates these categorical values for each pixel by the Majority Vote algorithm.

The method also supports the weighted majority voting if the skills parameter is provided for the fit method.

D. Jung-Lin Lee, A. Das Sarma and A. Parameswaran. Aggregating Crowdsourced Image Segmentations. CEUR Workshop Proceedings. Vol. 2173, (2018), 1-44.

https://ceur-ws.org/Vol-2173/paper10.pdf

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> from crowdkit.aggregation import SegmentationMajorityVote
>>> df = pd.DataFrame(
>>>     [
>>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
>>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
>>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
>>>     ],
>>>     columns=['task', 'worker', 'segmentation']
>>> )
>>> result = SegmentationMajorityVote().fit_predict(df)

Source code in crowdkit/aggregation/image_segmentation/segmentation_majority_vote.py

@attr.s
class SegmentationMajorityVote(BaseImageSegmentationAggregator):
    r"""The **Segmentation Majority Vote** algorithm chooses a pixel if and only if the pixel has "yes" votes
    from at least half of all workers.

    This method implements a straightforward approach to the image segmentation aggregation:
    it assumes that if a pixel is not inside the worker's segmentation, this vote is considered to be equal to 0.
    Otherwise, it is equal to 1. Then the `SegmentationEM` algorithm aggregates these categorical values
    for each pixel by the Majority Vote algorithm.

    The method also supports the weighted majority voting if the `skills` parameter is provided for the `fit` method.

    D. Jung-Lin Lee, A. Das Sarma and A. Parameswaran. Aggregating Crowdsourced Image Segmentations.
    *CEUR Workshop Proceedings. Vol. 2173*, (2018), 1-44.

    <https://ceur-ws.org/Vol-2173/paper10.pdf>

    Examples:
        >>> import numpy as np
        >>> import pandas as pd
        >>> from crowdkit.aggregation import SegmentationMajorityVote
        >>> df = pd.DataFrame(
        >>>     [
        >>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
        >>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
        >>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
        >>>     ],
        >>>     columns=['task', 'worker', 'segmentation']
        >>> )
        >>> result = SegmentationMajorityVote().fit_predict(df)
    """

    default_skill: Optional[float] = attr.ib(default=None)
    """Default worker weight value."""

    on_missing_skill: str = attr.ib(default="error")
    """A value which specifies how to handle assignments performed by workers with an unknown skill.

    Possible values:
    * `error`: raises an exception if there is at least one assignment performed by a worker with an unknown skill;
    * `ignore`: drops assignments performed by workers with an unknown skill during prediction,
    raises an exception if there are no assignments with a known skill for any task;
    * `value`: the default value will be used if a skill is missing."""

    skills_: Optional["pd.Series[Any]"] = named_series_attrib(name="skill")
    """The workers' skills. The `pandas.Series` data is indexed by `worker` and has the corresponding worker skill."""

    def fit(
        self, data: pd.DataFrame, skills: Optional["pd.Series[Any]"] = None
    ) -> "SegmentationMajorityVote":
        """
        Fits the model to the training data.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

            skills (Series): The workers' skills. The `pandas.Series` data is indexed by `worker`
                and has the corresponding worker skill.

        Returns:
            SegmentationMajorityVote: self.
        """

        data = data[["task", "worker", "segmentation"]]

        if skills is None:
            data["skill"] = 1
        else:
            data = add_skills_to_data(
                data, skills, self.on_missing_skill, self.default_skill
            )

        data["pixel_scores"] = data.segmentation * data.skill
        group = data.groupby("task")

        self.segmentations_ = (
            2 * group.pixel_scores.apply(np.sum) - group.skill.apply(np.sum)
        ).apply(lambda x: x >= 0)
        return self

    def fit_predict(
        self, data: pd.DataFrame, skills: Optional["pd.Series[Any]"] = None
    ) -> "pd.Series[Any]":
        """
        Fits the model to the training data and returns the aggregated segmentations.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

            skills (Series): The workers' skills. The `pandas.Series` data is indexed by `worker`
                and has the corresponding worker skill.

        Returns:
            Series: Task segmentations. The `pandas.Series` data is indexed by `task`
                so that `segmentations.loc[task]` is the task aggregated segmentation.
        """

        return self.fit(data, skills).segmentations_

`default_skill: Optional[float] = attr.ib(default=None)` `class-attribute` `instance-attribute`

Default worker weight value.

`on_missing_skill: str = attr.ib(default='error')` `class-attribute` `instance-attribute`

A value which specifies how to handle assignments performed by workers with an unknown skill.

Possible values: * error: raises an exception if there is at least one assignment performed by a worker with an unknown skill; * ignore: drops assignments performed by workers with an unknown skill during prediction, raises an exception if there are no assignments with a known skill for any task; * value: the default value will be used if a skill is missing.

`skills_: Optional[pd.Series[Any]] = named_series_attrib(name='skill')` `class-attribute` `instance-attribute`

The workers' skills. The pandas.Series data is indexed by worker and has the corresponding worker skill.

`fit(data, skills=None)`

Fits the model to the training data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required
`skills`	`Series`	The workers' skills. The `pandas.Series` data is indexed by `worker` and has the corresponding worker skill.	`None`

Returns:

Name	Type	Description
`SegmentationMajorityVote`	`SegmentationMajorityVote`	self.

Source code in crowdkit/aggregation/image_segmentation/segmentation_majority_vote.py

def fit(
    self, data: pd.DataFrame, skills: Optional["pd.Series[Any]"] = None
) -> "SegmentationMajorityVote":
    """
    Fits the model to the training data.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        skills (Series): The workers' skills. The `pandas.Series` data is indexed by `worker`
            and has the corresponding worker skill.

    Returns:
        SegmentationMajorityVote: self.
    """

    data = data[["task", "worker", "segmentation"]]

    if skills is None:
        data["skill"] = 1
    else:
        data = add_skills_to_data(
            data, skills, self.on_missing_skill, self.default_skill
        )

    data["pixel_scores"] = data.segmentation * data.skill
    group = data.groupby("task")

    self.segmentations_ = (
        2 * group.pixel_scores.apply(np.sum) - group.skill.apply(np.sum)
    ).apply(lambda x: x >= 0)
    return self

`fit_predict(data, skills=None)`

Fits the model to the training data and returns the aggregated segmentations.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required
`skills`	`Series`	The workers' skills. The `pandas.Series` data is indexed by `worker` and has the corresponding worker skill.	`None`

Returns:

Name	Type	Description
`Series`	`Series[Any]`	Task segmentations. The `pandas.Series` data is indexed by `task` so that `segmentations.loc[task]` is the task aggregated segmentation.

Source code in crowdkit/aggregation/image_segmentation/segmentation_majority_vote.py

def fit_predict(
    self, data: pd.DataFrame, skills: Optional["pd.Series[Any]"] = None
) -> "pd.Series[Any]":
    """
    Fits the model to the training data and returns the aggregated segmentations.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        skills (Series): The workers' skills. The `pandas.Series` data is indexed by `worker`
            and has the corresponding worker skill.

    Returns:
        Series: Task segmentations. The `pandas.Series` data is indexed by `task`
            so that `segmentations.loc[task]` is the task aggregated segmentation.
    """

    return self.fit(data, skills).segmentations_

`SegmentationRASA`

Bases: BaseImageSegmentationAggregator

The Segmentation RASA (Reliability Aware Sequence Aggregation) algorithm chooses a pixel if the sum of the weighted votes of each worker is more than 0.5.

The Segmentation RASA algorithm consists of three steps: 1. Performs the weighted Majority Vote algorithm. 2. Calculates weights for each worker from the current Majority Vote estimation. 3. Performs the Segmentation RASA algorithm for a single image.

The algorithm works iteratively. At each step, the workers are reweighted in proportion to their distances from the current answer estimation. The distance is calculated as \(1 - IOU\), where IOU (Intersection over Union) is an extent of overlap of two boxes. This algorithm is a modification of the RASA method for texts.

J. Li, F. Fukumoto. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, (2019), 24-28.

https://doi.org/10.18653/v1/D19-5904

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> from crowdkit.aggregation import SegmentationRASA
>>> df = pd.DataFrame(
>>>     [
>>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
>>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
>>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
>>>     ],
>>>     columns=['task', 'worker', 'segmentation']
>>> )
>>> result = SegmentationRASA().fit_predict(df)

Source code in crowdkit/aggregation/image_segmentation/segmentation_rasa.py

@attr.s
class SegmentationRASA(BaseImageSegmentationAggregator):
    r"""The **Segmentation RASA** (Reliability Aware Sequence Aggregation) algorithm chooses a pixel if the sum of the weighted votes of each worker is more than 0.5.

    The Segmentation RASA algorithm consists of three steps:
    1. Performs the weighted Majority Vote algorithm.
    2. Calculates weights for each worker from the current Majority Vote estimation.
    3. Performs the Segmentation RASA algorithm for a single image.

    The algorithm works iteratively. At each step, the workers are reweighted in proportion to their distances
    from the current answer estimation. The distance is calculated as $1 - IOU$, where `IOU` (Intersection over Union) is an extent of overlap of two boxes.
    This algorithm is a modification of the RASA method for texts.

    J. Li, F. Fukumoto. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation.
    *Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP*, (2019), 24-28.

    <https://doi.org/10.18653/v1/D19-5904>

    Examples:
        >>> import numpy as np
        >>> import pandas as pd
        >>> from crowdkit.aggregation import SegmentationRASA
        >>> df = pd.DataFrame(
        >>>     [
        >>>         ['t1', 'p1', np.array([[1, 0], [1, 1]])],
        >>>         ['t1', 'p2', np.array([[0, 1], [1, 1]])],
        >>>         ['t1', 'p3', np.array([[0, 1], [1, 1]])]
        >>>     ],
        >>>     columns=['task', 'worker', 'segmentation']
        >>> )
        >>> result = SegmentationRASA().fit_predict(df)
    """

    n_iter: int = attr.ib(default=10)
    """The maximum number of iterations."""

    tol: float = attr.ib(default=1e-5)
    """The tolerance stopping criterion for iterative methods with a variable number of steps.
    The algorithm converges when the loss change is less than the `tol` parameter."""

    weights_: npt.NDArray[Any] = attr.ib(init=False)
    """A list of workers' weights."""

    mv_: npt.NDArray[Any] = attr.ib(init=False)
    """The weighted task segmentations calculated with the Majority Vote algorithm."""

    loss_history_: List[float] = attr.ib(init=False)
    """A list of loss values during training."""

    @staticmethod
    def _segmentation_weighted(
        segmentations: "pd.Series[Any]", weights: npt.NDArray[Any]
    ) -> npt.NDArray[Any]:
        """
        Performs the weighted Majority Vote algorithm.

        From the weights of all workers and their segmentation, performs the
        weighted Majority Vote for the inclusion of each pixel in the answer.
        """
        weighted_segmentations = (weights * segmentations.T).T
        return cast(npt.NDArray[Any], weighted_segmentations.sum(axis=0))

    @staticmethod
    def _calculate_weights(
        segmentations: "pd.Series[Any]", mv: npt.NDArray[Any]
    ) -> npt.NDArray[Any]:
        """
        Calculates weights for each worker from the current Majority Vote estimation.
        """
        intersection = (segmentations & mv).astype(float)
        union = (segmentations | mv).astype(float)
        distances = 1 - intersection.sum(axis=(1, 2)) / union.sum(axis=(1, 2))  # type: ignore
        # add a small bias for more
        # numerical stability and correctness of transform.
        weights = np.log(1 / (distances + _EPS) + 1)
        return cast(npt.NDArray[Any], weights / np.sum(weights))

    def _aggregate_one(self, segmentations: "pd.Series[Any]") -> npt.NDArray[Any]:
        """
        Performs Segmentation RASA algorithm for a single image.
        """
        size = len(segmentations)
        segmentations_np = np.stack(segmentations.values)  # type: ignore
        weights = np.full(size, 1 / size)
        mv = self._segmentation_weighted(segmentations_np, weights)

        last_aggregated = None

        self.loss_history_ = []

        for _ in range(self.n_iter):
            weighted = self._segmentation_weighted(segmentations_np, weights)
            mv = weighted >= 0.5
            weights = self._calculate_weights(segmentations_np, mv)

            if last_aggregated is not None:
                delta = weighted - last_aggregated
                loss = (delta * delta).sum().sum() / (weighted * weighted).sum().sum()
                self.loss_history_.append(loss)

                if loss < self.tol:
                    break

            last_aggregated = weighted

        return mv

    def fit(self, data: pd.DataFrame) -> "SegmentationRASA":
        """Fits the model to the training data.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        Returns:
            SegmentationRASA: self.
        """

        data = data[["task", "worker", "segmentation"]]

        # The latest pandas version installable under Python3.7 is pandas 1.1.5.
        # This version fails to accept a method with an error but works fine with lambdas
        # >>> TypeError: unhashable type: 'SegmentationRASA'duito an inner logic that tries
        aggregate_one = lambda arg: self._aggregate_one(arg)

        self.segmentations_ = data.groupby("task").segmentation.apply(aggregate_one)

        return self

    def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
        """Fits the model to the training data and returns the aggregated segmentations.

        Args:
            data (DataFrame): The training dataset of workers' segmentations
                which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

        Returns:
            Series: Task segmentations. The `pandas.Series` data is indexed by `task`
                so that `segmentations.loc[task]` is the task aggregated segmentation.
        """

        return self.fit(data).segmentations_

`loss_history_: List[float] = attr.ib(init=False)` `class-attribute` `instance-attribute`

A list of loss values during training.

`mv_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

The weighted task segmentations calculated with the Majority Vote algorithm.

`n_iter: int = attr.ib(default=10)` `class-attribute` `instance-attribute`

The maximum number of iterations.

`tol: float = attr.ib(default=1e-05)` `class-attribute` `instance-attribute`

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

`weights_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

A list of workers' weights.

`fit(data)`

Fits the model to the training data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required

Returns:

Name	Type	Description
`SegmentationRASA`	`SegmentationRASA`	self.

Source code in crowdkit/aggregation/image_segmentation/segmentation_rasa.py

def fit(self, data: pd.DataFrame) -> "SegmentationRASA":
    """Fits the model to the training data.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

    Returns:
        SegmentationRASA: self.
    """

    data = data[["task", "worker", "segmentation"]]

    # The latest pandas version installable under Python3.7 is pandas 1.1.5.
    # This version fails to accept a method with an error but works fine with lambdas
    # >>> TypeError: unhashable type: 'SegmentationRASA'duito an inner logic that tries
    aggregate_one = lambda arg: self._aggregate_one(arg)

    self.segmentations_ = data.groupby("task").segmentation.apply(aggregate_one)

    return self

`fit_predict(data)`

Fits the model to the training data and returns the aggregated segmentations.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training dataset of workers' segmentations which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.	required

Returns:

Name	Type	Description
`Series`	`Series[Any]`	Task segmentations. The `pandas.Series` data is indexed by `task` so that `segmentations.loc[task]` is the task aggregated segmentation.

Source code in crowdkit/aggregation/image_segmentation/segmentation_rasa.py

def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
    """Fits the model to the training data and returns the aggregated segmentations.

    Args:
        data (DataFrame): The training dataset of workers' segmentations
            which is represented as the `pandas.DataFrame` data containing `task`, `worker`, and `segmentation` columns.

    Returns:
        Series: Task segmentations. The `pandas.Series` data is indexed by `task`
            so that `segmentations.loc[task]` is the task aggregated segmentation.
    """

    return self.fit(data).segmentations_

Segmentation

SegmentationEM

eps: float = 1e-15 class-attribute instance-attribute

errors_: npt.NDArray[Any] = attr.ib(init=False) class-attribute instance-attribute

loss_history_: List[float] = attr.ib(init=False) class-attribute instance-attribute

n_iter: int = attr.ib(default=10) class-attribute instance-attribute

posteriors_: npt.NDArray[Any] = attr.ib(init=False) class-attribute instance-attribute

priors_: Union[float, npt.NDArray[Any]] = attr.ib(init=False) class-attribute instance-attribute

segmentation_region_size_: int = attr.ib(init=False) class-attribute instance-attribute

segmentations_sizes_: npt.NDArray[Any] = attr.ib(init=False) class-attribute instance-attribute

tol: float = attr.ib(default=1e-05) class-attribute instance-attribute

fit(data)

fit_predict(data)

SegmentationMajorityVote

default_skill: Optional[float] = attr.ib(default=None) class-attribute instance-attribute

on_missing_skill: str = attr.ib(default='error') class-attribute instance-attribute

skills_: Optional[pd.Series[Any]] = named_series_attrib(name='skill') class-attribute instance-attribute

fit(data, skills=None)

fit_predict(data, skills=None)

SegmentationRASA

loss_history_: List[float] = attr.ib(init=False) class-attribute instance-attribute

mv_: npt.NDArray[Any] = attr.ib(init=False) class-attribute instance-attribute

n_iter: int = attr.ib(default=10) class-attribute instance-attribute

tol: float = attr.ib(default=1e-05) class-attribute instance-attribute

weights_: npt.NDArray[Any] = attr.ib(init=False) class-attribute instance-attribute

fit(data)

fit_predict(data)

`SegmentationEM`

`eps: float = 1e-15` `class-attribute` `instance-attribute`

`errors_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`loss_history_: List[float] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`n_iter: int = attr.ib(default=10)` `class-attribute` `instance-attribute`

`posteriors_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`priors_: Union[float, npt.NDArray[Any]] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`segmentation_region_size_: int = attr.ib(init=False)` `class-attribute` `instance-attribute`

`segmentations_sizes_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`tol: float = attr.ib(default=1e-05)` `class-attribute` `instance-attribute`

`fit(data)`

`fit_predict(data)`

`SegmentationMajorityVote`

`default_skill: Optional[float] = attr.ib(default=None)` `class-attribute` `instance-attribute`

`on_missing_skill: str = attr.ib(default='error')` `class-attribute` `instance-attribute`

`skills_: Optional[pd.Series[Any]] = named_series_attrib(name='skill')` `class-attribute` `instance-attribute`

`fit(data, skills=None)`

`fit_predict(data, skills=None)`

`SegmentationRASA`

`loss_history_: List[float] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`mv_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`n_iter: int = attr.ib(default=10)` `class-attribute` `instance-attribute`

`tol: float = attr.ib(default=1e-05)` `class-attribute` `instance-attribute`

`weights_: npt.NDArray[Any] = attr.ib(init=False)` `class-attribute` `instance-attribute`

`fit(data)`

`fit_predict(data)`