Skip to content

Pairwise

Routines for pairwise aggregation.

BradleyTerry

Bases: BasePairwiseAggregator

Bradley-Terry model for pairwise comparisons.

The model implements the classic algorithm for aggregating pairwise comparisons. The algorithm constructs an items' ranking based on pairwise comparisons. Given a pair of two items \(i\) and \(j\), the probability of \(i\) to be ranked higher is, according to the Bradley-Terry's probabilistic model, $$ P(i > j) = \frac{p_i}{p_i + p_j}. $$ Here \(\boldsymbol{p}\) is a vector of positive real-valued parameters that the algorithm optimizes. These optimization process maximizes the log-likelihood of observed comparisons outcomes by the MM-algorithm: $$ L(\boldsymbol{p}) = \sum_{i=1}^n\sum_{j=1}^n[w_{ij}\ln p_i - w_{ij}\ln (p_i + p_j)], $$ where \(w_{ij}\) denotes the number of comparisons of \(i\) and \(j\) "won" by \(i\).

{% note info %}

The Bradley-Terry model needs the comparisons graph to be strongly connected.

{% endnote %}

David R. Hunter. MM algorithms for generalized Bradley-Terry models Ann. Statist., Vol. 32, 1 (2004): 384–406.

Bradley, R. A. and Terry, M. E. Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika, Vol. 39 (1952): 324–345.

Examples:

The Bradley-Terry model needs the data to be a DataFrame containing columns left, right, and label. left and right contain identifiers of left and right items respectively, label contains identifiers of items that won these comparisons.

>>> import pandas as pd
>>> from crowdkit.aggregation import BradleyTerry
>>> df = pd.DataFrame(
>>>     [
>>>         ['item1', 'item2', 'item1'],
>>>         ['item2', 'item3', 'item2']
>>>     ],
>>>     columns=['left', 'right', 'label']
>>> )
Source code in crowdkit/aggregation/pairwise/bradley_terry.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
@attr.s
class BradleyTerry(BasePairwiseAggregator):
    r"""Bradley-Terry model for pairwise comparisons.

    The model implements the classic algorithm for aggregating pairwise comparisons.
    The algorithm constructs an items' ranking based on pairwise comparisons. Given
    a pair of two items $i$ and $j$, the probability of $i$ to be ranked higher is,
    according to the Bradley-Terry's probabilistic model,
    $$
    P(i > j) = \frac{p_i}{p_i + p_j}.
    $$
    Here $\boldsymbol{p}$ is a vector of positive real-valued parameters that the algorithm optimizes. These
    optimization process maximizes the log-likelihood of observed comparisons outcomes by the MM-algorithm:
    $$
    L(\boldsymbol{p}) = \sum_{i=1}^n\sum_{j=1}^n[w_{ij}\ln p_i - w_{ij}\ln (p_i + p_j)],
    $$
    where $w_{ij}$ denotes the number of comparisons of $i$ and $j$ "won" by $i$.

    {% note info %}

    The Bradley-Terry model needs the comparisons graph to be **strongly connected**.

    {% endnote %}

    David R. Hunter.
    MM algorithms for generalized Bradley-Terry models
    *Ann. Statist.*, Vol. 32, 1 (2004): 384–406.

    Bradley, R. A. and Terry, M. E.
    Rank analysis of incomplete block designs. I. The method of paired comparisons.
    *Biometrika*, Vol. 39 (1952): 324–345.

    Examples:
        The Bradley-Terry model needs the data to be a `DataFrame` containing columns
        `left`, `right`, and `label`. `left` and `right` contain identifiers of left and
        right items respectively, `label` contains identifiers of items that won these
        comparisons.

        >>> import pandas as pd
        >>> from crowdkit.aggregation import BradleyTerry
        >>> df = pd.DataFrame(
        >>>     [
        >>>         ['item1', 'item2', 'item1'],
        >>>         ['item2', 'item3', 'item2']
        >>>     ],
        >>>     columns=['left', 'right', 'label']
        >>> )
    """

    n_iter: int = attr.ib()
    """A number of optimization iterations."""

    tol: float = attr.ib(default=1e-5)
    """The tolerance stopping criterion for iterative methods with a variable number of steps.
    The algorithm converges when the loss change is less than the `tol` parameter."""

    loss_history_: List[float] = attr.ib(init=False)
    """A list of loss values during training."""

    def fit(self, data: pd.DataFrame) -> "BradleyTerry":
        """Args:
            data (DataFrame): Workers' pairwise comparison results.
                A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
                For each row `label` must be equal to either `left` column or `right` column.

        Returns:
            BradleyTerry: self.
        """

        M, unique_labels = self._build_win_matrix(data)

        if not unique_labels.size:
            self.scores_ = pd.Series([], dtype=np.float64)
            return self

        T: npt.NDArray[np.int_] = M.T + M
        active: npt.NDArray[np.bool_] = T > 0

        w = M.sum(axis=1)

        Z = np.zeros_like(M, dtype=float)

        p = np.ones(M.shape[0])
        p_new = p.copy() / p.sum()

        p_old = None

        self.loss_history_ = []

        for _ in range(self.n_iter):
            P: npt.NDArray[np.float_] = np.broadcast_to(p, M.shape)

            Z[active] = T[active] / (P[active] + P.T[active])

            p_new[:] = w
            p_new /= Z.sum(axis=0)
            p_new /= p_new.sum()
            p[:] = p_new

            if p_old is not None:
                loss = np.abs(p_new - p_old).sum()

                if loss < self.tol:
                    break

            p_old = p_new

        self.scores_ = pd.Series(p_new, index=unique_labels)

        return self

    def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
        """Args:
            data (DataFrame): Workers' pairwise comparison results.
                A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
                For each row `label` must be equal to either `left` column or `right` column.

        Returns:
            Series: 'Labels' scores.
                A pandas.Series index by labels and holding corresponding label's scores
        """
        return self.fit(data).scores_

    @staticmethod
    def _build_win_matrix(
        data: pd.DataFrame,
    ) -> Tuple[npt.NDArray[np.int_], npt.NDArray[np.int_]]:
        data = data[["left", "right", "label"]]

        unique_labels, np_data = np.unique(data.values, return_inverse=True)
        np_data = np_data.reshape(data.shape)

        left_wins = np_data[np_data[:, 0] == np_data[:, 2], :2].T
        right_wins = np_data[np_data[:, 1] == np_data[:, 2], 1::-1].T

        win_matrix = np.zeros((unique_labels.size, unique_labels.size), dtype="int")

        np.add.at(win_matrix, tuple(left_wins), 1)
        np.add.at(win_matrix, tuple(right_wins), 1)

        return win_matrix, unique_labels

loss_history_: List[float] = attr.ib(init=False) class-attribute instance-attribute

A list of loss values during training.

n_iter: int = attr.ib() class-attribute instance-attribute

A number of optimization iterations.

tol: float = attr.ib(default=1e-05) class-attribute instance-attribute

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

fit(data)

Parameters:

Name Type Description Default
data DataFrame

Workers' pairwise comparison results. A pandas.DataFrame containing worker, left, right, and label columns'. For each row label must be equal to either left column or right column.

required

Returns:

Name Type Description
BradleyTerry BradleyTerry

self.

Source code in crowdkit/aggregation/pairwise/bradley_terry.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def fit(self, data: pd.DataFrame) -> "BradleyTerry":
    """Args:
        data (DataFrame): Workers' pairwise comparison results.
            A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
            For each row `label` must be equal to either `left` column or `right` column.

    Returns:
        BradleyTerry: self.
    """

    M, unique_labels = self._build_win_matrix(data)

    if not unique_labels.size:
        self.scores_ = pd.Series([], dtype=np.float64)
        return self

    T: npt.NDArray[np.int_] = M.T + M
    active: npt.NDArray[np.bool_] = T > 0

    w = M.sum(axis=1)

    Z = np.zeros_like(M, dtype=float)

    p = np.ones(M.shape[0])
    p_new = p.copy() / p.sum()

    p_old = None

    self.loss_history_ = []

    for _ in range(self.n_iter):
        P: npt.NDArray[np.float_] = np.broadcast_to(p, M.shape)

        Z[active] = T[active] / (P[active] + P.T[active])

        p_new[:] = w
        p_new /= Z.sum(axis=0)
        p_new /= p_new.sum()
        p[:] = p_new

        if p_old is not None:
            loss = np.abs(p_new - p_old).sum()

            if loss < self.tol:
                break

        p_old = p_new

    self.scores_ = pd.Series(p_new, index=unique_labels)

    return self

fit_predict(data)

Parameters:

Name Type Description Default
data DataFrame

Workers' pairwise comparison results. A pandas.DataFrame containing worker, left, right, and label columns'. For each row label must be equal to either left column or right column.

required

Returns:

Name Type Description
Series Series[Any]

'Labels' scores. A pandas.Series index by labels and holding corresponding label's scores

Source code in crowdkit/aggregation/pairwise/bradley_terry.py
124
125
126
127
128
129
130
131
132
133
134
def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
    """Args:
        data (DataFrame): Workers' pairwise comparison results.
            A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
            For each row `label` must be equal to either `left` column or `right` column.

    Returns:
        Series: 'Labels' scores.
            A pandas.Series index by labels and holding corresponding label's scores
    """
    return self.fit(data).scores_

NoisyBradleyTerry

Bases: BasePairwiseAggregator

Bradley-Terry model for pairwise comparisons with additional parameters.

This model is a modification of the BradleyTerry model with parameters for workers' skills (reliability) and biases.

Examples:

The following example shows how to aggregate results of comparisons grouped by some column. In the example the two questions q1 and q2 are used to group the labeled data. Temporary data structure is created and the model is applied to it. The results are split into two arrays, and each array contains scores for one of the initial groups.

>>> import pandas as pd
>>> from crowdkit.aggregation import NoisyBradleyTerry
>>> data = pd.DataFrame(
>>>     [
>>>         ['q1', 'w1', 'a', 'b', 'a'],
>>>         ['q1', 'w2', 'a', 'b', 'b'],
>>>         ['q1', 'w3', 'a', 'b', 'a'],
>>>         ['q2', 'w1', 'a', 'b', 'b'],
>>>         ['q2', 'w2', 'a', 'b', 'a'],
>>>         ['q2', 'w3', 'a', 'b', 'b'],
>>>     ],
>>>     columns=['question', 'worker', 'left', 'right', 'label']
>>> )
>>> # Append question to other columns. After that the data looks like:
>>> #   question worker     left    right    label
>>> # 0       q1     w1  (q1, a)  (q1, b)  (q1, a)
>>> for col in 'left', 'right', 'label':
>>>     data[col] = list(zip(data['question'], data[col]))
>>> result = NoisyBradleyTerry(n_iter=10).fit_predict(data)
>>> # Separate results
>>> result.index = pd.MultiIndex.from_tuples(result.index, names=['question', 'label'])
>>> print(result['q1'])      # Scores for all items in the q1 question
>>> print(result['q2']['b']) # Score for the item b in the q2 question
Source code in crowdkit/aggregation/pairwise/noisy_bt.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
@attr.s
class NoisyBradleyTerry(BasePairwiseAggregator):
    r"""Bradley-Terry model for pairwise comparisons with additional parameters.

    This model is a modification of the BradleyTerry model with parameters
    for workers' skills (reliability) and biases.

    Examples:
        The following example shows how to aggregate results of comparisons **grouped by some column**.
        In the example the two questions `q1` and `q2` are used to group the labeled data.
        Temporary data structure is created and the model is applied to it.
        The results are split into two arrays, and each array contains scores for one of the initial groups.

        >>> import pandas as pd
        >>> from crowdkit.aggregation import NoisyBradleyTerry
        >>> data = pd.DataFrame(
        >>>     [
        >>>         ['q1', 'w1', 'a', 'b', 'a'],
        >>>         ['q1', 'w2', 'a', 'b', 'b'],
        >>>         ['q1', 'w3', 'a', 'b', 'a'],
        >>>         ['q2', 'w1', 'a', 'b', 'b'],
        >>>         ['q2', 'w2', 'a', 'b', 'a'],
        >>>         ['q2', 'w3', 'a', 'b', 'b'],
        >>>     ],
        >>>     columns=['question', 'worker', 'left', 'right', 'label']
        >>> )
        >>> # Append question to other columns. After that the data looks like:
        >>> #   question worker     left    right    label
        >>> # 0       q1     w1  (q1, a)  (q1, b)  (q1, a)
        >>> for col in 'left', 'right', 'label':
        >>>     data[col] = list(zip(data['question'], data[col]))
        >>> result = NoisyBradleyTerry(n_iter=10).fit_predict(data)
        >>> # Separate results
        >>> result.index = pd.MultiIndex.from_tuples(result.index, names=['question', 'label'])
        >>> print(result['q1'])      # Scores for all items in the q1 question
        >>> print(result['q2']['b']) # Score for the item b in the q2 question
    """

    n_iter: int = attr.ib(default=100)
    """A number of optimization iterations."""

    tol: float = attr.ib(default=1e-5)
    """The tolerance stopping criterion for iterative methods with a variable number of steps.
    The algorithm converges when the loss change is less than the `tol` parameter."""

    regularization_ratio: float = attr.ib(default=1e-5)
    """The regularization ratio."""

    random_state: int = attr.ib(default=0)
    """The state of the random number generator."""

    skills_: "pd.Series[Any]" = named_series_attrib(name="skill")
    """A pandas.Series index by workers and holding corresponding worker's skill"""

    biases_: "pd.Series[Any]" = named_series_attrib(name="bias")
    """Predicted biases for each worker. Indicates the probability of a worker to choose the left item.
    A series of worker biases indexed by workers."""

    def fit(self, data: pd.DataFrame) -> "NoisyBradleyTerry":
        """Args:
            data (DataFrame): Workers' pairwise comparison results.
                A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
                For each row `label` must be equal to either `left` column or `right` column.

        Returns:
            NoisyBradleyTerry: self.
        """

        unique_labels, np_data = factorize(data[["left", "right", "label"]].values)
        unique_workers, np_workers = factorize(data.worker.values)  # type: ignore
        np.random.seed(self.random_state)
        x_0 = np.random.rand(1 + unique_labels.size + 2 * unique_workers.size)
        np_data += 1

        x = minimize(
            self._compute_log_likelihood,
            x_0,
            jac=self._compute_gradient,
            args=(
                np_data,
                np_workers,
                unique_labels.size,
                unique_workers.size,
                self.regularization_ratio,
            ),
            method="L-BFGS-B",
            options={"maxiter": self.n_iter, "ftol": np.float32(self.tol)},
        )

        biases_begin = unique_labels.size + 1
        workers_begin = biases_begin + unique_workers.size

        self.scores_ = pd.Series(
            expit(x.x[1:biases_begin]),
            index=pd.Index(unique_labels, name="label"),
            name="score",
        )
        self.biases_ = pd.Series(
            expit(x.x[biases_begin:workers_begin]), index=unique_workers
        )
        self.skills_ = pd.Series(expit(x.x[workers_begin:]), index=unique_workers)

        return self

    def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
        """Args:
            data (DataFrame): Workers' pairwise comparison results.
                A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
                For each row `label` must be equal to either `left` column or `right` column.

        Returns:
            Series: 'Labels' scores.
                A pandas.Series index by labels and holding corresponding label's scores
        """
        return self.fit(data).scores_

    @staticmethod
    def _compute_log_likelihood(
        x: npt.NDArray[Any],
        np_data: npt.NDArray[Any],
        np_workers: npt.NDArray[Any],
        labels: int,
        workers: int,
        regularization_ratio: float,
    ) -> float:
        s_i = x[np_data[:, 0]]
        s_j = x[np_data[:, 1]]
        y = np.zeros_like(np_data[:, 2])
        y[np_data[:, 0] == np_data[:, 2]] = 1
        y[np_data[:, 0] != np_data[:, 2]] = -1
        q = x[1 + np_workers + labels]
        gamma = x[1 + np_workers + labels + workers]

        total = np.sum(
            np.log(
                expit(gamma) * expit(y * (s_i - s_j))
                + (1 - expit(gamma)) * expit(y * q)
            )
        )
        reg = np.sum(np.log(expit(x[0] - x[1 : labels + 1]))) + np.sum(
            np.log(expit(x[1 : labels + 1] - x[0]))
        )

        return float(-total + regularization_ratio * reg)

    @staticmethod
    def _compute_gradient(
        x: npt.NDArray[Any],
        np_data: npt.NDArray[Any],
        np_workers: npt.NDArray[Any],
        labels: int,
        workers: int,
        regularization_ratio: float,
    ) -> npt.NDArray[Any]:
        gradient = np.zeros_like(x)

        for worker_idx, (left_idx, right_idx, label) in zip(np_workers, np_data):
            s_i = x[left_idx]
            s_j = x[right_idx]
            y = 1 if label == left_idx else -1
            q = x[1 + labels + worker_idx]
            gamma = x[1 + labels + workers + worker_idx]

            # We'll use autograd in the future
            gradient[left_idx] += (y * np.exp(y * (-(s_i - s_j)))) / (
                (np.exp(-gamma) + 1)
                * (np.exp(y * (-(s_i - s_j))) + 1) ** 2
                * (
                    1 / ((np.exp(-gamma) + 1) * (np.exp(y * (-(s_i - s_j))) + 1))
                    + (1 - 1 / (np.exp(-gamma) + 1)) / (np.exp(-q * y) + 1)
                )
            )  # noqa
            gradient[right_idx] += -(
                y * (np.exp(q * y) + 1) * np.exp(y * (s_i - s_j) + gamma)
            ) / (
                (np.exp(y * (s_i - s_j)) + 1)
                * (
                    np.exp(y * (s_i - s_j) + gamma + q * y)
                    + np.exp(y * (s_i - s_j) + gamma)
                    + np.exp(y * (s_i - s_j) + q * y)
                    + np.exp(q * y)
                )
            )  # noqa
            gradient[labels + worker_idx] = (
                y * np.exp(q * y) * (np.exp(s_i * y) + np.exp(s_j * y))
            ) / (
                (np.exp(q * y) + 1)
                * (
                    np.exp(y * (s_i + q) + gamma)
                    + np.exp(s_i * y + gamma)
                    + np.exp(y * (s_i + q))
                    + np.exp(y * (s_j + q))
                )
            )  # noqa #dq
            gradient[labels + workers + worker_idx] = (
                np.exp(gamma) * (np.exp(s_i * y) - np.exp(y * (s_j + q)))
            ) / (
                (np.exp(gamma) + 1)
                * (
                    np.exp(y * (s_i + q) + gamma)
                    + np.exp(s_i * y + gamma)
                    + np.exp(y * (s_i + q))
                    + np.exp(y * (s_j + q))
                )
            )  # noqa #dgamma

        gradient[1 : labels + 1] -= regularization_ratio * np.tanh(
            (x[1 : labels + 1] - x[0]) / 2.0
        )
        gradient[0] += regularization_ratio * np.sum(
            np.tanh((x[1 : labels + 1] - x[0]) / 2.0)
        )
        return -gradient

biases_: pd.Series[Any] = named_series_attrib(name='bias') class-attribute instance-attribute

Predicted biases for each worker. Indicates the probability of a worker to choose the left item. A series of worker biases indexed by workers.

n_iter: int = attr.ib(default=100) class-attribute instance-attribute

A number of optimization iterations.

random_state: int = attr.ib(default=0) class-attribute instance-attribute

The state of the random number generator.

regularization_ratio: float = attr.ib(default=1e-05) class-attribute instance-attribute

The regularization ratio.

skills_: pd.Series[Any] = named_series_attrib(name='skill') class-attribute instance-attribute

A pandas.Series index by workers and holding corresponding worker's skill

tol: float = attr.ib(default=1e-05) class-attribute instance-attribute

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

fit(data)

Parameters:

Name Type Description Default
data DataFrame

Workers' pairwise comparison results. A pandas.DataFrame containing worker, left, right, and label columns'. For each row label must be equal to either left column or right column.

required

Returns:

Name Type Description
NoisyBradleyTerry NoisyBradleyTerry

self.

Source code in crowdkit/aggregation/pairwise/noisy_bt.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def fit(self, data: pd.DataFrame) -> "NoisyBradleyTerry":
    """Args:
        data (DataFrame): Workers' pairwise comparison results.
            A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
            For each row `label` must be equal to either `left` column or `right` column.

    Returns:
        NoisyBradleyTerry: self.
    """

    unique_labels, np_data = factorize(data[["left", "right", "label"]].values)
    unique_workers, np_workers = factorize(data.worker.values)  # type: ignore
    np.random.seed(self.random_state)
    x_0 = np.random.rand(1 + unique_labels.size + 2 * unique_workers.size)
    np_data += 1

    x = minimize(
        self._compute_log_likelihood,
        x_0,
        jac=self._compute_gradient,
        args=(
            np_data,
            np_workers,
            unique_labels.size,
            unique_workers.size,
            self.regularization_ratio,
        ),
        method="L-BFGS-B",
        options={"maxiter": self.n_iter, "ftol": np.float32(self.tol)},
    )

    biases_begin = unique_labels.size + 1
    workers_begin = biases_begin + unique_workers.size

    self.scores_ = pd.Series(
        expit(x.x[1:biases_begin]),
        index=pd.Index(unique_labels, name="label"),
        name="score",
    )
    self.biases_ = pd.Series(
        expit(x.x[biases_begin:workers_begin]), index=unique_workers
    )
    self.skills_ = pd.Series(expit(x.x[workers_begin:]), index=unique_workers)

    return self

fit_predict(data)

Parameters:

Name Type Description Default
data DataFrame

Workers' pairwise comparison results. A pandas.DataFrame containing worker, left, right, and label columns'. For each row label must be equal to either left column or right column.

required

Returns:

Name Type Description
Series Series[Any]

'Labels' scores. A pandas.Series index by labels and holding corresponding label's scores

Source code in crowdkit/aggregation/pairwise/noisy_bt.py
120
121
122
123
124
125
126
127
128
129
130
def fit_predict(self, data: pd.DataFrame) -> "pd.Series[Any]":
    """Args:
        data (DataFrame): Workers' pairwise comparison results.
            A pandas.DataFrame containing `worker`, `left`, `right`, and `label` columns'.
            For each row `label` must be equal to either `left` column or `right` column.

    Returns:
        Series: 'Labels' scores.
            A pandas.Series index by labels and holding corresponding label's scores
    """
    return self.fit(data).scores_