TurkerView
  • Requesters
  • Institutions
  • Scripts
  • Queuebicle
  • API
  • Qualifeye
  • Forum
  • Search
  • Login
  • Login
    • Action
    • Another action
    • Something else here
    • Separated link

Search TurkerView

  • Requesters
  • HITs
  • Scripts

Sorry, have to turn this off for a day to try and fix a query search bug!

Conversational AI

Is this your requester account?
No Institutional Affiliation
  • Overview
  • Reviews 648
  • HITs 358

Conversational AI Ratings


Workers feel this requester pays fairly

Unrated

Approves Quickly

Rejections Reported

No Blocks
Sorry, your browser doesn't support canvas elements.

Conversational AI Wage History


Sorry, your browser doesn't support canvas elements.
Heads up! We'll never hide reviews unless they violate our Terms of Service.

Top Worker Reviews

  • More
    • Best
    • Newest
    • Rejected
    • Hourly (High)
    • Hourly (Low)
    • My Reviews

AfterDarkMark Average Pace
Reviews: 1,229
Points: 4,923
Ratings: 483
Insert an extra question about a new subject in a given dialogue - $0.15

Unrated

Unrated

Pending

$12.00 / hour

00:00:45 / completion time

Pros

First, this is going to be long, and what I write in pros/ cons may not actually be pros/ cons, but should probably just be read from pros to cons, then advice if I put anything there. It's as much a review of this hit, as it is about other batches/ requesters/ workers & the state of mturk at this time.

These hits used to be one I considered "bread and butter". I did thousands of these hits in the past without a single rejection, but the last batch I worked on I got a few, and it seems like other workers got even more (the "other review site" shows even more complaints). I had kind of sworn these off after the last batch I worked on in April, but decided to do a 100 tonight and see how that turns out. We'll see.

I consider creative writing/ rewriting, etc tasks to be one of my strengths, so I focus on tasks like this a lot. These tasks build upon each other. They are very similar to some other requesters, like Alexandria, where one batch leads to another and so on. I guess I am writing this for more experienced mturkers who know the kind of tasks I am talking about, but for those of you who aren't familiar with them, in one batch, you might write a question, in another you might write an answer, and this simulates an AI question/ answer system or human conversation or story telling, etc.

Cons

Due to this, poor work in one batch, makes future batches more difficult, as future workers on a task have to notice if someone screwed up, or they may make mistakes without even realizing it. I think that's where my rejections came from in the last batch. Right at the end of the batch, I noticed some dialogues that were just "off", but I still did them and submitted, when I probably should have returned.

Now, going back to Alexandria and their affiliated accounts, they approve, but soft block bad workers, which I have always appreciated because it weeds out bad workers and leaves more work for good workers. Even though in the short term, "good" workers have to work a little harder due to the poor responses from others.

With this account though, they don't do that, and it seems like a point has been hit where the work has gone downhill (possibly due in part to Covid-19 and more workers working more on mturk, and possibly also due to their 99% approval rating(explained below)), so now the requester is starting to reject a lot more tasks. And due to the 10's if not 100's of thousands of tasks this requester has posted in the last few years, that won't be reflected in the approval rating anytime soon, if ever. As a side note, this is a good reason to be careful, even when a requester's approval is 99%. You never know...

Additionally, this requester is notorious for the lack of communication associated with the account. I wish that weren't so because I feel like I (and a lot of other workers) could really help this requester figure out how to get better data and save money on their end, which I pride myself on doing with other requesters (especially the great ones).

In the batch I just did, even compared to a month ago, the work that had been done previously is pretty sloppy and just plain bad to be honest (not most of it, but I'll explain more in a sec). I think that's a result of rejections starting to happen a while back. Consequently, more legit/ "good" workers started staying away. Then, because of the 99% approval and more tasks being available, more shoddy workers assumed everything they submitted would be auto approved or just did half a$$ work or shouldn't be working on writing tasks in the first place.

This probably led to more rejections, and now, I expect the cycle will continue, barring the requester making changes or bad workers giving up because they are getting slammed with rejections. (another side note, a similar situation killed Alexandria work for a while a year or so ago because people just started spamming writing hits, knowing they would be approved and not caring about a soft block).

I've seen other good workers post that they are done with this requester. I'm kind of in that place for now too. Though as I said, I decided to test the water again tonight. However, if you used to work these tasks a lot, take a look at the batches briefly, even if you aren't going to work them. I did 100, but I've never returned so many while working these batches.

Why? There were tasks where the user clearly thought he or she was the agent instead of the user/ question asker and vice versa. Tasks where people screwed up, and this led to the booking of multiple hotels or cabs or restaurants within the dialogue. Tasks where it is clear that workers just clicked a random bubble and entered the question or answer (for the "sister" batch), not paying attention to what part of the conversation they were inserting a question into (example: the hotel was booked in the early part of the conversation, then a cab, then a restaurant, but then someone just randomly throws in a restaurant Q in the early part of the cab booking.)

My hourly on the 100 I did was definitely lower than in the past, but I wanted to see if I stuck to the ones that were straightforward and hadn't been sullied by "bad" workers, if I would receive any rejects this time. This meant even more careful reading & returning quite a few.

Sorry for going on and on, but like I said, this requester used to pay my rent or part of it at times, and I tried to do good, honest work (I always do, but for requesters that really are paying fair, I try to go above and beyond).

I'm seeing this more and more lately and not just with batch work, but with surveys too. It seems like poor work is causing more rejections, which in other cases lowers approval ratings of requesters, which keeps good workers away, which leads to similar cycles that I mentioned above.

I don't know that there is a "cure", especially in a case like this, where the requester won't adapt or figure out a way to fix things without driving away the workers who are completing tasks properly.

I will say, if this requester would up the min approval rating to do these tasks, that would probably help a lot, or even just soft blocking workers or setting up a qual task would help too.

Sorry for going on and on about this, and repeating myself at times, but from my experience tonight, these are some of the conclusions I've reached. And it hurts mturk as a whole. It drives requesters away. It takes money out of all of our pockets. With the size of the batch today, I would have made an extra $100-150 today had I not been wary of a mass rejection.

If you are going to risk working on these at this point, here are a few tips:

-return tasks where the dialogue is super screwed up or out of order
-return tasks where you have to ask a Q that has already been asked in the dialogue
-return tasks where the Q you are being asked to insert doesn't fit anywhere in the dialogue (ex. you have a hotel Q, but the dialogue is only about a restaurant and cab booking)
-stay away in general if you're not a good writer/ native English speaker
-*Don't copy paste or use the same Q or answer twice, which may have caused rejections for other workers. Add extra words and phrases and vary how you are asking questions and giving answers.

I guess that's all I have to say. I might edit/ change this/ fix typos or make this more coherent cause I kind of digressed, but I hope some of you can connect with this or take something from it.

Good luck turking and be careful out there. A couple batch rejections early on kept me from getting a master's qual for 5+ years, if you don't have one yet.

And this is how you waste an hour on mturk. hahaha

Advice to Requester

I don't think anyone from your account or research team will ever check this, but in case you do:

Up the approval rating if you want better data.
Consider a qual task.
Weed out bad workers.
Stop driving away good workers through sometimes seemingly random rejections. Manually review them, at least.
*Communicate with us. We can save you a lot of money and help you get a much higher quality of data.*
And there's probably a lot more advice to be given that I can't think of. Hope at some point you get in touch with some workers or figure out how to make your hits great again. The batch I just worked on is indicative of the cycle I mentioned above. You are driving away better workers and just looking through your hits, it is easy to see that the quality of the dialogues is going down drastically because of it.
May 15, 2020 | 18 workers found this helpful.

Marta9227 Relaxed Pace
Reviews: 20
Points: 70
Ratings: 32
Insert an extra question about a new subject in a given dialogue - $0.15

Unrated

Unrated

Pending

$9.82 / hour

00:00:55 / completion time

Pros

Quick

Cons

They will reject a bunch of hits for arbitrary reasons. I think they reject a certain percentage no matter what. They rejected in my case for a reason not listed ANYWHERE in the instructions, My advice is avoid these at all costs. It seems like a quick and easy batch, which it is, but there is a good chance they will randomly reject a portion of your hits.

Advice to Requester

Either stop being sheisty and randomly rejecting or put the specific things you require in the hit in the instructions.
May 11, 2020 | 3 workers found this helpful.

Hedgmog Careful Reader
Reviews: 10,159
Points: 22,045
Ratings: 1,426
Chatbot evaluation - $0.80

Low

Unrated

Approved

$10.59 / hour

00:04:32 / completion time

Pros

Worked in the MTurk system (no external links) - read instructions - have a conversation with an AI - have a total of 10 exchanges as per the instructions - pretty simple to work through - could be done quicker

Cons

This was the first time that the AI responses were really non sensical
May 17, 2022

Want to see Conversational AI's full profile?

Create Your Account

or Login

Conversational AI


AXGV36Y8I76HV MTurk Search Contact Requester

Recently Reviewed HITs


[Pilot] Embodied Task Completion
A qualitative HIT for Chatbot evaluation
A Qualitative HIT for DSTC9 turn-level annotation
A qualitative HIT to rate the quality of a Chatbot response└[∵┌]└[ ∵ ]┘[┐∵]┘
A qualitative HIT to rate the quality of each turn in a conversation

Ratings Legend

Wage Aggregates

Reward Sentiment

Communication Scores

Approval Tracking

Wage Aggregate Tracking

This is fairly straightforward: we take the completion time & the reward amount (where available) and calculate the average hourly rate for the task. We then apply that number to a simple range based on US minimum wage standards to color-code the data for easy to digest numerical data.

Color Pay Range (Hourly) Explanation
RED < $7.25 / hr Hourly averages below US Federal minimum wage
ORANGE $7.25 - $10.00 / hr Hourly averages between Federal & highest statewide (CA) minimum wages.
GREEN > $10.00 / hr Hourly averages above all US minimum wage standards

Reward Sentiment

Not all HITs are created equal. Sometimes an hourly wage doesn't convey the full story of a HIT's true worth, so we encourage workers to give their opinion on the overall pay of the task. Was it $8/hr to rate pictures of puppies? A worker could justifiably bump up the rating a bit for something so adorable. 10 hours locked in Inquisit? Even for $10/hr many workers would appreciate the heads up on such a task. The Pay Sentiment rating helps connect workers beyond the hard data.

Icon Rating Suggested Guidelines
Underpaid 1 / 5
  • Very low or no pay
  • Frustrating work experience
  • Inadequate instructions
Low 2 / 5
  • Below US min-wage ($7.25/hr)
  • No redeeming qualities to make up for pay
Fair 3 / 5
  • Minimum wages for task (consider SE taxes!)
  • Work experience offers nothing to tip the scales in a positive or negative direction
Good 4 / 5
  • Pay is above minimum wage, or compensates better than average for the level of effort required.
  • The overall work experience makes up for borderline wages
Generous 5 / 5
  • Pay is exceptional.
  • Interesting, engaging work or work environment
  • Concise instructions, well designed HIT.

Communication Ratings

Communication is an underrated aspect of mTurk. Clear, concise directions. A fast response to a clarification question or a resolution to a workflow suggestion can all be valuable aspects of interaction between Requesters & Workers and its worth keeping track of. Plus everyone enjoys the peace of mind knowing that if something does go wrong there will be an actual human getting back to you to solve the issue.

Icon Rating Suggested Guidelines
Unacceptable 1 / 5
  • No response at all
  • Rude response without a resolution
Poor 2 / 5
  • Responsive, but unhelpful
  • Required IRB or extra intervention
Acceptable 3 / 5
  • Responded in a reasonable timeframe
  • Resolves issues to a minimum level of satisfaction.
Good 4 / 5
  • Prompt Response
  • Positive resolution
Excellent 5 / 5
  • Prompt response time
  • Friendly & Professional
  • Helpful / Solved Issues
  • Interacts within the community

Approval Time Tracking

This rating is strictly for approval times. Let's face it, no one wants to mix approval time ratings with how fast a Requester rejects a HIT, so we've saved rejection flags for another category. This provides a more straightforward way to know about how long your HIT might sit pending before paying out. The default auto-approval for most MTurk tasks is 3 days, the maximum is 30 days. We've tried to base our ratings around those data-points.

Icon Rating Approval Time
Very Slow 1 / 5 Over 2 weeks
Slow 2 / 5 ~1 - 2 Weeks
Average 3 / 5 ~3 - 7 Days
Fast 4 / 5 ~1 - 3 Days
Very Fast 5 / 5 ~24 hours or less

Login

Login Failed! Please check your username/password and try again.
TurkerHub Member? Just use your normal TurkerHub credentials to log in to TurkerView.
Don't have an account? Register Here!

2025 TurkerView Privacy Terms Blog Contact