anonymized using a hash so that joins can still be done post-hoc
timestamped so that trends can be visualized
Features include:
customer_hash: string hash of customer_no
group_customer_hash: string hash of group_customer_no
timestamp: submission timestamp
survey: filename / title of the survey
question: text of the question
subquestion: text of the subquestion (or NA if none)
answer: text of the response
encoded_answer: embedding of the answer (e.g. integers for likert scale)
Usage
survey_stream(
survey_dir = config::get("tessistream")$survey_dir,
reader = survey_monkey
)
Arguments
- survey_dir
directory of surveys to parse
- reader
function(filename) that reads survey data, the only current reader is survey_monkey
Note
There's no way to irreversibly anonymize this data and still allow post-hoc joins. The secret in this case (the customer number) is stored openly in the database, the hashing algorithm is explained here, and the number of possible customer numbers is small, so brute forcing the mapping is trivial.
The goal is just to make it more difficult to extract customer information from this table so that the user knows what they are doing.