This example is a demonstration of Crowdom's data labeling workflow for annotation tasks.

In annotation tasks, unlike classification tasks, there are an "unlimited" (comparing to fixed classification label set) number of possible solutions.

Data labeling quality control measures differ in this case – unlike control tasks in classification, we ask other workers to check received solutions (annotations).

For annotation task workflow example, we chose an audio transcription task - we ask workers to write down the words they hear in the audios.

If this is your first time with Crowdom workflow structure – visit image classification workflow example.

Setup environment

In [ ]:
%pip install crowdom
In [ ]:
from datetime import timedelta
import os
import pandas as pd

import toloka.client as toloka

from crowdom import base, datasource, client, objects, pricing, params as labeling_params

Logging customization

In [2]:
import yaml
import logging.config
In [3]:
with open('logging.yaml') as f:
    logging.config.dictConfig(yaml.full_load(f.read()))

Crowdsourcing platfrom authorization

In [4]:
from IPython.display import clear_output, display
In [5]:
token = os.getenv('TOLOKA_TOKEN') or input('Enter your token: ')
clear_output()

Authorization

In [6]:
toloka_client = client.create_toloka_client(token=token)

Test environment

In [ ]:
toloka_client = client.create_toloka_client(token=token, environment=toloka.TolokaClient.Environment.SANDBOX)

Labeling task definition

We are dealing with annotation task, and we transcribe Audio into Text:

In [8]:
annotation_function = base.AnnotationFunction(
    inputs=(objects.Audio,),
    outputs=(objects.Text,)
)

Worker interface preview

In [9]:
example_url = 'https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_299.wav'
example_audio = (objects.Audio(url=example_url),)

client.TaskPreview(example_audio, task_function=annotation_function, lang='EN').display_link()
Out[9]:

Workers instruction

In [10]:
instruction = {
    'RU': 'Запишите звучащие на аудио слова, без знаков препинания и заглавных букв.',
    'EN': 'Transcribe the audio, without any punctuation or capitalization.'}

Labeling task specification

In [11]:
task_spec = base.TaskSpec(
    id='audio-transcription',
    function=annotation_function,
    name={'EN': 'Audio transcription', 'RU': 'Расшифровка аудио'},
    description={'EN': 'Transcribe short audios', 'RU': 'Расшифровка коротких аудио'},
    instruction=instruction)

Workers in their task feed will see your task for EN language like this, depending on where they are doing the tasks:

Browser

Mobile app

task-card-browser task-card-app

Language of your data:

In [12]:
lang = 'EN'

Localized version of annotation_task_spec:

In [13]:
task_spec_en = client.AnnotationTaskSpec(task_spec, lang)

Importing source data

Expected file format – JSON list, each object having keys from name typed like type. As is image classification example, for media types, such as Audio, we expect URLs.

In [14]:
datasource.file_format(task_spec_en.task_mapping)
Out[14]:
name type
0 audio str
In [15]:
input_objects = datasource.read_tasks('tasks.json', task_spec_en.task_mapping)
control_objects = None

Reference labeling

In addition to the source data, a reference labeling is expected in the file. For our task, reference labeling is correct transcription, located in text field.

In [16]:
datasource.file_format(task_spec_en.task_mapping, has_solutions=True)
Out[16]:
name type
0 audio str
1 text str
In [17]:
control_objects = datasource.read_tasks(
    'control_tasks.json',
    task_spec_en.task_mapping,
    has_solutions=True,
)

Define task duration hint.

In [18]:
# audios are 3-10 seconds each, and workers need time to transcribe them
task_duration_hint = timedelta(seconds=20)

Task verification and feedback

Experts reward definition

Define estimated task duration hint for experts:

In [34]:
task_duration_hint = timedelta(seconds=30)
In [35]:
from crowdom import experts, project
In [37]:
scenario = project.Scenario.EXPERT_LABELING_OF_TASKS
experts_task_spec = client.AnnotationTaskSpec(task_spec, lang, scenario)


if control_objects:
    objects = control_objects
    experts_task_spec = experts_task_spec.check
else:
    objects = input_objects

Inhouse experts

In [39]:
avg_price_per_hour = None

External Toloka experts

In [38]:
avg_price_per_hour = 3.5  # USD

Pricing config

In [40]:
pricing_options = pricing.get_expert_pricing_options(
    task_duration_hint, experts_task_spec.task_mapping, avg_price_per_hour)
pricing_config = pricing.choose_default_expert_option(pricing_options, avg_price_per_hour)

Getting feedback

In [ ]:
client.define_task(experts_task_spec, toloka_client)
In [50]:
raw_feedback = client.launch_experts(
    experts_task_spec,
    client.ExpertParams(
        task_duration_hint=task_duration_hint,
        pricing_config=pricing_config,
    ),
    objects[:10],
    experts.ExpertCase.TASK_VERIFICATION,
    toloka_client,
    interactive=True)
clear formula, which does not account edge cases like min commission and incomplete assignments
$\displaystyle TotalPrice_{clear} = TaskCount * PricePerTask_\$ * Overlap * (1 + TolokaCommission) = 10 * 0.0003\$ * 1 * 1.3 = 0.00\$.$
precise formula, which accounts all edge cases
$\displaystyle TotalPrice_{precise} = AssignmentCount * PricePerAssignment_\$ * Overlap = \lceil TaskCount / TasksOnAssignment \rceil * (PricePerAssignment_\$ + max(PricePerAssignment_\$ * TolokaCommission, MinTolokaCommission_\$) * Overlap = \lceil 10 / 29 \rceil * (0.01\$ + max(0.01\$ * 0.3, 0.005\$) * 1 = 1 * 0.015 * 1 = 0.01\$.$
run expert labeling of 10 objects for 0.01$? [Y/n] Y
2022-05-11 11:42:27,838 - crowdom.client.launch:launch_experts:207 - INFO: - expert labeling has started
In [51]:
worker_id_to_name = {'fd060a4d57b00f9bba4421fe4c7c22f3': 'bob'}  # {'< hex 32-digit id >': '< username >'}
In [100]:
feedback = client.ExpertLabelingResults(raw_feedback, experts_task_spec, worker_id_to_name)
In [101]:
feedback_df = feedback.get_results()

with pd.option_context("max_colwidth", 100):
    display(feedback_df)
audio text eval _ok _comment worker duration
0 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_216.wav he said that healthy eating was high on the council agenda True True correct transcription: "he said that healthy eating was high on the council agenda" bob 0 days 00:01:29.653100
1 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_146.wav it was deployed in the gold war False True correct transcription: "it was deployed in the gulf war" - "gulf war", not "gold war" bob 0 days 00:01:29.653100
2 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_099.wav they were under a lot of pressure from the other clubs True True correct transcription: "they were under a lot of pressure from the other clubs" bob 0 days 00:01:29.653100
3 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_178.wav its the real thing for sure True False not clear what to do with contractions - "it is", "its" or "it's" bob 0 days 00:01:29.653100
4 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_166.wav you are not going in blind True True bob 0 days 00:01:29.653100
5 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_347.wav sex offender programmes to be retained by public sector True True bob 0 days 00:01:29.653100
6 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_164.wav that must be left to the parole board True True bob 0 days 00:01:29.653100
7 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_261.wav similar measures are expected in England and Wales False True correct transcription: "similar measures are expected in england and wales", no punctiuation is ... bob 0 days 00:01:29.653100
8 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_105.wav that is no use True False not clear what to do with contractions - "that is", "thats" or "that's" bob 0 days 00:01:29.653100
9 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_029.wav their courage and their honesty should be respected True True correct transcription: "their courage and their honesty should be respected" bob 0 days 00:01:29.653100
In [62]:
task_duration_hint = feedback_df['duration'].mean().to_pytimedelta()  # with reference labeling
# task_duration_hint = timedelta(seconds=experts_proposed_value)  # without reference labeling
task_duration_hint
Out[62]:
datetime.timedelta(seconds=29, microseconds=908080)

During the annotation process, as a measure of quality control, we show gathered annotations to other workers and ask them to evaluate them – we refer to this process as annotation check. This process, however, needs its own quality control measures – so, we can create control objects and training for annotation check as well as training for main annotation process.

In [86]:
control_objects, _ = feedback.get_correct_objects(client.ExpertLabelingApplication.CONTROL_TASKS)

Creating workers training

Annotation training

In [103]:
training_objects, comments = feedback.get_correct_objects(application=client.ExpertLabelingApplication.TRAINING)
In [106]:
training_config = pricing.choose_default_training_option(
    pricing.get_training_options(task_duration_hint, len(training_objects), training_time=timedelta(minutes=2)))
In [ ]:
client.define_task(task_spec_en, toloka_client)
In [111]:
client.create_training(
    task_spec_en,
    training_objects,
    comments,
    toloka_client,
    training_config)
2022-05-11 12:06:27,566 - toloka.client:create_training:1736 - INFO: - A new training with ID "1173848" has been created.

Annotation check training

In [216]:
check_training_objects, check_comments = feedback.get_correct_objects(application=client.ExpertLabelingApplication.ANNOTATION_CHECK_TRAINING)
In [115]:
training_config = pricing.choose_default_training_option(
    pricing.get_training_options(task_duration_hint, len(training_objects), training_time=timedelta(minutes=2)))
In [ ]:
client.define_task(task_spec_en, toloka_client)
In [218]:
client.create_training(
    task_spec_en.check,
    check_training_objects,
    check_comments,
    toloka_client,
    training_config)
2022-05-11 18:15:37,631 - toloka.client:create_training:1736 - INFO: - A new training with ID "1174158" has been created.

Labeling efficiency optimization

You can skip any customization in this section and use default options, which we consider suitable for a wide range of typical tasks, or tune parameters to you liking.

For general information about labeling efficiency optimization and for information about customization for classification task – view image classification example.

Annotation labeling process consists of two distinct subprocesses – annotation and check steps. You can interactively customize parameters for each of these steps independently.

Most of parameters for annotation step are the same as for classification. There's a new addition – Assignment check sample.

With this option enabled, only a portion of tasks would be checked from each assignment - you can change this number with Max tasks to check option. After that if enough of these tasks were done correctly, this whole assignment would be finalized - all tasks from it would be considered checked, and no more checks would be created for them. You can change this threshold number of correctly done tasks with Accuracy threshold option.

Assignment check sample can reduce cost and time of labeling process, but low check coverage can't guarantee high quality for unchecked solutions.

You can specify different task_duration_hints for main process and check, if they require significantly different time to complete.

In [29]:
params_form = labeling_params.get_annotation_interface(
    task_spec=task_spec_en,
    check_task_duration_hint=task_duration_hint,
    annotation_task_duration_hint=task_duration_hint,
    toloka_client=toloka_client)
In [20]:
check_params, annotation_params = params_form.get_params()

Efficiency customization

You can define your own pricing config for labeling.

However, you can only specify real_task_count and assignment_price for it, we cannot use control tasks directly for labeling quality control.

In [25]:
from crowdom import classification, classification_loop, control, evaluation, worker
In [26]:
pricing_config = pricing.PoolPricingConfig(assignment_price=0.05, real_tasks_count=20, control_tasks_count=0)

assert pricing_config.control_tasks_count == 0

Define quality and control params:

In [27]:
assignment_check_sample = evaluation.AssignmentCheckSample(
    max_tasks_to_check=15,
    assignment_accuracy_finalization_threshold=0.85,
)

You can specify a custom overlap, minimum number attempts for annotation should always be 1:

In [28]:
correct_done_task_ratio_for_acceptance = 0.5

control_params = control.Control(
    rules=control.RuleBuilder().add_static_reward(
        threshold=correct_done_task_ratio_for_acceptance).add_speed_control(
            # if worker complete tasks in 10% of expected time, we will reject assignment assuming fraud/scripts/random clicking
            # specify 0 to disable this control option
            ratio_rand=.1,
            # if worker complete tasks in 30% of expected time, we will block him for a while, suspecting poor performance
            # specify 0 to disable this control option
            ratio_poor=.3,
        ).build())


annotation_params = client.AnnotationParams(
    task_duration_hint=task_duration_hint,
    pricing_config=pricing_config,
    overlap=classification_loop.DynamicOverlap(min_overlap=1, max_overlap=3, confidence=0.85),
    control=control_params,
    assignment_check_sample=assignment_check_sample,
    worker_filter=worker.WorkerFilter(
        filters=[
             worker.WorkerFilter.Params(
                 langs={worker.LanguageRequirement(lang=lang)},
                 regions=worker.lang_to_default_regions.get(lang, {}),
                 age_range=(18, None),
             ),
        ],
        training_score=None,
    ),
)

assert isinstance(annotation_params.overlap, classification_loop.DynamicOverlap)

Labeling of your data

In [ ]:
client.define_task(task_spec_en, toloka_client)
In [188]:
assert control_objects, 'No control objects supplied'
assert isinstance(control_objects[0], tuple)

try:
    task_spec_en.check.task_mapping.validate_objects(control_objects[0][0])
except:
    control_objects = [(task + solution, (base.BinaryEvaluation(ok=True),)) for (task, solution) in control_objects]
In [192]:
artifacts = client.launch_annotation(
    task_spec_en,
    annotation_params,
    check_params,
    input_objects,
    control_objects,
    toloka_client)
2022-05-11 13:55:59,666 - crowdom.client.launch:launch_annotation:266 - INFO: - annotation has started

In [195]:
results = artifacts.results

Results study

Ground truth (most probable option):

In [197]:
with pd.option_context("max_colwidth", 100):
    display(results.predict())
audio text
0 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_398.wav but they are still unlikely to be involved in combat
2 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he is slightly confused
6 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young
10 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomenon physically
14 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have excepted it as a miracle without any physical explanation
19 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav it is very nice
23 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it was not just the character and the energy of the playing
26 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_264.wav the work between musicians and the fire is very important
28 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_345.wav instead he was informed to join the line of creditors
30 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call stella
34 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_042.wav you must always attempt to raise the bar
36 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_101.wav however the intensive care unit at the southern general hospital is full
38 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill would be massive
43 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_411.wav you wanted the evidence
45 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_306.wav it should be equal
47 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed
51 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with ...
55 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_333.wav like his acting it was an accident
57 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury
61 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_362.wav you must have a government and a good civil service

All gathered annotations with respective confidence values:

In [198]:
with pd.option_context("max_colwidth", 100):
    display(results.predict_proba())
audio text confidence
0 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_398.wav but they are still unlikely to be involved in combat 0.991803
2 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he is slightly confused 0.995690
4 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he was slightly confused 0.008197
6 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young 0.995690
10 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomenon physically 0.995690
12 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomena physically 0.008197
14 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have excepted it as a miracle without any physical explanation 0.995690
16 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have accepted it as a miracle without physical explanation 0.045455
19 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav it is very nice 0.995690
21 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav its very nice 0.008197
23 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it was not just the character and the energy of the playing 0.995690
25 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it wasnt just the character and the energy of the play NaN
26 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_264.wav the work between musicians and the fire is very important 0.991803
28 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_345.wav instead he was informed to join the line of creditors 0.991803
30 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call stella 0.995690
32 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call Stella 0.008197
34 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_042.wav you must always attempt to raise the bar 0.991803
36 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_101.wav however the intensive care unit at the southern general hospital is full 0.991803
38 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill would be massive 0.995690
40 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill will be massive 0.045455
43 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_411.wav you wanted the evidence 0.991803
45 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_306.wav it should be equal 0.991803
47 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed 0.995690
51 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with ... 0.995690
55 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_333.wav like his acting it was an accident 0.991803
57 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury 0.995690
61 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_362.wav you must have a government and a good civil service 0.991803

Detailed information about each annotation and each check for it:

In [210]:
with pd.option_context('max_colwidth', 150), pd.option_context('display.max_rows', 100):
    display(results.worker_labels())
audio text annotator annotation_overlap confidence evaluation_overlap eval evaluator evaluator_weight
0 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_398.wav but they are still unlikely to be involved in combat fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
1 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_398.wav but they are still unlikely to be involved in combat fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
2 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he is slightly confused 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
3 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he is slightly confused 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
4 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he was slightly confused fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
5 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_213.wav he was slightly confused fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False 6d85abd870df2592ef79175f99b5b93c 0.916667
6 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
7 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
8 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
9 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_398.wav i was very young fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
10 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomenon physically 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
11 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomenon physically 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
12 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomena physically fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
13 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_017.wav others have tried to explain phenomena physically fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False 6d85abd870df2592ef79175f99b5b93c 0.916667
14 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have excepted it as a miracle without any physical explanation 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
15 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have excepted it as a miracle without any physical explanation 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
16 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have accepted it as a miracle without physical explanation fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
17 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have accepted it as a miracle without physical explanation fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 False 6d85abd870df2592ef79175f99b5b93c 0.916667
18 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_013.wav some have accepted it as a miracle without physical explanation fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 False f87548bd9c317ed987e22c8ebe3dea3c 0.954545
19 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav it is very nice 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
20 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav it is very nice 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
21 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav its very nice fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
22 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_259.wav its very nice fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False 6d85abd870df2592ef79175f99b5b93c 0.916667
23 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it was not just the character and the energy of the playing 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
24 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it was not just the character and the energy of the playing 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
25 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_296.wav it wasnt just the character and the energy of the play fd060a4d57b00f9bba4421fe4c7c22f3 2 NaN 0 NaN NaN NaN
26 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_264.wav the work between musicians and the fire is very important fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
27 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_264.wav the work between musicians and the fire is very important fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
28 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_345.wav instead he was informed to join the line of creditors fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
29 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_345.wav instead he was informed to join the line of creditors fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
30 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call stella 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
31 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call stella 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
32 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call Stella fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
33 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_001.wav please call Stella fd060a4d57b00f9bba4421fe4c7c22f3 2 0.008197 2 False 6d85abd870df2592ef79175f99b5b93c 0.916667
34 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_042.wav you must always attempt to raise the bar fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
35 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_042.wav you must always attempt to raise the bar fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
36 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_101.wav however the intensive care unit at the southern general hospital is full fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
37 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_101.wav however the intensive care unit at the southern general hospital is full fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
38 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill would be massive 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
39 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill would be massive 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
40 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill will be massive fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 False fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
41 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill will be massive fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 True 6d85abd870df2592ef79175f99b5b93c 0.916667
42 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_434.wav the bill will be massive fd060a4d57b00f9bba4421fe4c7c22f3 2 0.045455 3 False f87548bd9c317ed987e22c8ebe3dea3c 0.954545
43 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_411.wav you wanted the evidence fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
44 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_411.wav you wanted the evidence fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
45 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_306.wav it should be equal fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
46 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_306.wav it should be equal fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
47 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
48 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
49 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
50 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_093.wav he was possessed fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
51 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with an abnormally wide yellow band since red and green... 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
52 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with an abnormally wide yellow band since red and green... 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
53 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with an abnormally wide yellow band since red and green... fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
54 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_023.wav if the red of the second bow falls upon the green of the first the result is to give a bow with an abnormally wide yellow band since red and green... fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
55 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_333.wav like his acting it was an accident fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
56 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p257_333.wav like his acting it was an accident fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667
57 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
58 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury 6d85abd870df2592ef79175f99b5b93c 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
59 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True 155593e4339bc240a2863da87fcdb856 0.916667
60 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_309.wav we are now looking at the degrees of injury fd060a4d57b00f9bba4421fe4c7c22f3 2 0.995690 2 True f87548bd9c317ed987e22c8ebe3dea3c 0.954545
61 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_362.wav you must have a government and a good civil service fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True fd060a4d57b00f9bba4421fe4c7c22f3 0.916667
62 https://tlk.s3.yandex.net/ext_dataset/noisy_speech/noisy_tested_wav/p232_362.wav you must have a government and a good civil service fd060a4d57b00f9bba4421fe4c7c22f3 1 0.991803 2 True 6d85abd870df2592ef79175f99b5b93c 0.916667

Labeling quality verification

Quality verification closely resembles task verification with reference labeling. However, it slightly differs in options. You can run verification on random sample of labeled objects:

In [211]:
import random
In [212]:
scenario = project.Scenario.EXPERT_LABELING_OF_SOLVED_TASKS
experts_task_spec = client.AnnotationTaskSpec(task_spec, lang, scenario)

sample_size = min(20, int(0.1 * len(input_objects)))
objects = random.sample(client.select_control_tasks(input_objects, results.raw, min_confidence=.0), sample_size)
In [213]:
client.define_task(experts_task_spec, toloka_client)
2022-05-16 15:45:21,607 - crowdom.client.task:define_task:125 - INFO: - no changes in task
In [214]:
raw_feedback = client.launch_experts(
    experts_task_spec,
    client.ExpertParams(
        task_duration_hint=task_duration_hint,
        pricing_config=pricing_config,
    ),
    objects,
    experts.ExpertCase.LABELING_QUALITY_VERIFICATION,
    toloka_client,
    interactive=True)
clear formula, which does not account edge cases like min commission and incomplete assignments
$\displaystyle TotalPrice_{clear} = TaskCount * PricePerTask_\$ * Overlap * (1 + TolokaCommission) = 10 * 0.0003\$ * 1 * 1.3 = 0.00\$.$
precise formula, which accounts all edge cases
$\displaystyle TotalPrice_{precise} = AssignmentCount * PricePerAssignment_\$ * Overlap = \lceil TaskCount / TasksOnAssignment \rceil * (PricePerAssignment_\$ + max(PricePerAssignment_\$ * TolokaCommission, MinTolokaCommission_\$) * Overlap = \lceil 10 / 29 \rceil * (0.01\$ + max(0.01\$ * 0.3, 0.005\$) * 1 = 1 * 0.015 * 1 = 0.01\$.$
run expert labeling of 10 objects for 0.01$? [Y/n] Y
2022-05-16 15:40:26,018 - crowdom.client.launch:launch_experts:209 - INFO: - expert labeling has started
In [215]:
test_results = client.ExpertLabelingResults(raw_feedback, experts_task_spec)
In [216]:
test_results.get_accuracy()
Out[216]:
1.0