Presentation

Video: Gretel Evaluate & Synthetic Data Quality Scoring

August 3, 2022

A deep dive on Gretel's synthetic data quality evaluations and scoring

Read the blog post

Transcription

Speaker 1 (00:08):

And hello everyone. Welcome to another Gretel virtual event. Today we are doing a deep dive into gretel, evaluate and synthetic quality scoring. My name is Mason Edgar, and joining me today are Lipica and Amy from our amazing applied research teams to talk to us about synthetic quality score and all of the really cool stuff that goes behind that. So without any further ado, I'm just going to go ahead and turn it over to y'all and go ahead and present when you're ready.

Speaker 2 (00:40):

Sounds great. So I'll bring up my screen.

Speaker 1 (00:48):

Yes, bring up the thing you're going to present. If I bring it up right now, we're going to get the infinity mirror, which is fun.

Speaker 2 (00:54):

Okay. Okay.

Speaker 1 (01:01):

Okay, so looks like we're still in Infinity Mirror. You ready?

Speaker 2 (01:04):

Okay. So am I sharing?

Speaker 1 (01:07):

You are, but I see the Streamy yard screen.

Speaker 2 (01:10):

All right, let's see if I go here.

Speaker 1 (01:13):

Okay, we can see the notebook.

Speaker 2 (01:15):

Awesome. Okay, so what I'm going to do today is step everyone through our Gretel evaluate blueprint, which lets you create a synthetic quality report on any two data sets. And then we'll look step through that report in detail and talk about how we create the SQS score and what goes into it. So in this notebook you can see as it starts, you first, if you haven't already, you install your Gretel client. I've already done that, so I'm not going to run that cell. Then you import the various libraries that are needed, pay no attention to that, and then you authenticate in the next cell. So if you have run a GRETEL blueprint before, then your authentication will be cached. And so I've run one before, so minus cached here, but in the event that you haven't, then what you'll need to do is go over to the Gretel console. So here I'm on my dashboard and you can see the API access key here. So I regenerate the key and copy it,

(02:28)
Then go back in. And here it would prompt me for the API key if I'd never entered it before. Okay. So the dataset that we're going to use today is the US Adult Census Income dataset. That's very popular and used widely for different machine learning test cases. So this first cell, we grab the original training data. So this is the original dataset and you can see it's got age, work class, all these different fields that come together to predict whether your income is greater than 50 K or less than or equal to 50 K. Okay. So now elsewhere, I've already created synthetic data for this dataset and this was created with Gretel, but you could use it with any synthetic dataset that you have. So here I'm going to pull up the synthetic data that I'd already created. And again, all the same fields. Looks pretty darn good.

(03:31)
Okay. And then all you got to do is click on create the report. So I'm going to go ahead and click on this, and this can take anywhere from 30 seconds up to a minute so that we don't have to sit here waiting. I already ran this notebook this morning and I have the report that it generates over here. So now I'm going to step you through what report it generated. So at the very top of the screen you can see the synthetic data quality score is 90. So that's excellent. So throughout this whole report, there's these little question marks and you can click on them and a lot of information comes up about what you're looking at. So in this little informational snippet, it tells you what the SQS score means, and the purpose of it is to test how well the importance statistical information in the training data was maintained in the synthetic data.

(04:35)
So in a sense, it's like a utility or confidence score as to whether if you were to have done a scientific analysis in the synthetic dataset, whether the conclusions you draw would be the same as if you had done that scientific analysis in the training dataset. So there's three scores, different metrics that we use that combine together to get the SQS score and we'll step through those in detail, but they're the field distribution stability, field correlation, stability, and deep structure stability. So you can see here we also tell you depending on your use case, what we recommend for a synthetic for an SQS score. So starting down here at the bottom, we say that if your use case is balance or augment machine learning data sources or it's machine learning or statistical analysis, then we recommend you have an excellent or a good score because the statistical integrity of data is pretty critical to that downstream use case.

(05:43)
And then further up demo environments or mock data, pre-production testing environments. In those scenarios, you could have excellent, good, but you could also have moderate. Maintaining the statistical integrity is not as critical in those use cases. And then we say if you have any score other than excellence, you should use our tips and advice to try to improve your model more. You don't have to, but we give a lot of good advice for how you might do that. And if you happen to get very poor, which is rare, but if you do, then that implies that you need significant tuning to improve your model.

Speaker 3 (06:25):

If I may add really quickly, Amy, one place that you can find tips and advice is in our documentation. So there's a ton. You have a small data set, if you have a wide data set, if you have primarily numeric values, data set, what you should do or what you can't do. So that's one place to look. There are other places are blogs, but just still that's coming out of somewhere where it's coming from. I wanted to provide some examples. Samples,

Speaker 2 (06:56):

Yeah, you're breaking up a little bit, but I think I understood what you were saying. Yeah, we have a lot of documentation on how to improve your score on what configs are appropriate depending on the nature of your dataset. Lots of documentation on that.

(07:16)
Okay, so then the three main scores listed here, the field correlation, stability, deep structure stability and field distribution stability. And they each have their own score, which just happens to be excellent here. And they combine together to get the overall SQS score and lipa and a little bit will tell us about how we do that. And so you can see in this cell, we also tell you the row count in your training data and synthetic data, the column count in training and synthetic data, and if any training lines were duplicated in the synthetic data, so that would be very bad. You always want that to be zero. Okay, so now I'm going to click on the little informational tip for field correlation stability. And what it's telling us here is that the way we compute this is we compute the correlation between every pair of fields in the training data and then we do the same thing in the synthetic data. And then we take the absolute difference between these two correlation matrices. So we basically just subtract the two matrices, take the absolute value and then average that. So to compute the correlation, if the two fields are numeric, then we'll use Pearson's correlation coefficient. If one field is numeric and the other is categorical, we use correlation ratio. If both fields are categorical, then we use feels you.

(08:59)
So correlation, that's pretty critical for any kind of downstream, statistical or machine learning tasks. So you want that number to be high. So corresponding to that, we display these heat maps. So here is the training correlations. On the left, in the middle is the synthetic correlations, and then this is the difference matrix I was talking about on the far right. So you want the first two graphs to look as much alike as possible. And the third graph to be as yellow, I guess that is, or light green as possible. And so you can use the plotly has a little toggle thing that I find really helpful. You click on it. You can see here that going across for this one cell age and occupation, the training correlation is point 19, synthetic correlation is point 14. So the difference is 0.06. So that's kind of handy.

(10:00)
Alright, turn that off. All right, so now the second one is deep structure stability. So you can click on the little question mark and get a whole bunch of information about that. So we wanted another metric that led us look at deeper multi field distributions and correlations and PCA, we found that to do that quite effectively for us. So the way we do this is we compute A PCA on the training data and then we do a PCA on the synthetic data, and then we look at the principal components and we do a distributional distance between training and synthetic, and that's the score. So PCA is super popular in data science. They use it for dimensionality reduction for visualization. So in a way it's kind of quick feedback as to whether your synthetic data is going to be effective in a downstream machine learning task. And then, oh, scroll down, we give you this super cool visualization of your PCA. The left is the training data, the right is the synthetic data. So you want these graphs to look as much alike as possible. So here you can see there's two kind of dense clusters in the middle that are repeated in the synthetic data. So we did a pretty nice job there.

(11:34)
Then for field distribution stability, click on that. Okay, so this is a measure. We look at the distribution, we compare the distribution in the training data of a field to the distribution in the synthetic data, and we use the Jensen Shannon distance score to compute a distance metric. So you can then go down and here we have the cell that gives a nice overview of all the distribution stability. So for every single field here in the dataset, we actually did quite well. We scored an excellent score. So I forgot to mention, so excellent is up here. Excellent is like 80 to a hundred. This green bar good is 60 to 80. This light green bar moderate is 40 to 60. This orange bar and poor is 20 to 40, this dark orange bar and then very poor is zero to 20, this red bar there.

(12:51)
So in this training field overview for every field you can see how many unique values there were, how many missing values there were, the average length and the data type it was, and how good the distribution stability is. If you end up having trouble with some fields in their distribution stability, sometimes looking at these data characteristics can be sort of insightful. So if there's a whole bunch of missing data and the average length is really long, then it might be a tougher field than other fields. And then you can click on any one of these. I'll click on age and you can go down and see a graph for each one of the fields. So here on age, this is a histogram, and if I mouse over, you can see, so the x axis is the actual value of the field and the Y axis is the percentage of that value in the dataset. So here it's saying ages between 32 and 34 or 5.8100000000000005% of the dataset. And if it's categorical like here with marital status, the purple bar is the training always, and the green bar is the synthetic always. So you can mouse over and here marital status value of never married is 33.24% in the training and 31.86 or six six in the synthetic. So that's a really nice quick snapshot with each of these graphs as to how each distribution is doing.

Speaker 3 (14:44):

Hey Amy, in the overview of each variable, those numbers there, those are for the training set, right? So the number of unique values missing average length?

Speaker 2 (14:55):

Yes, those are for the training.

Speaker 4 (14:58):

Awesome. Let's

Speaker 2 (14:59):

See what little information we might get from this. Just tells you what I told you.

Speaker 4 (15:09):

Nice. Okay,

Speaker 1 (15:11):

I have a quick question though. Sure. So can you scroll back up to the top? Yeah, right there. So you said we were using field correlation, stability, deep structure stability and field distribution stability. How did you decide that those three were the ones that make a good synthetic quality data? I imagine there's probably other stabilities or other things, I don't know, I'm not well versed in this, but how did you decide that those three things were what was important for generating quality?

Speaker 2 (15:41):

A ton, a ton of research and explorative POCs. So the thing is there really is no standard metrics for quality when it comes to synthetic data. So what we found was that field correlation stability and field distribution stability are common. One of the more popular stats used, and they're pretty basic to being needed if you're going to use the downstream statistical or machine learning task. But then we wanted something more than that. We wanted something that looked at the deep structure and multi field, multi field correlations and distributions. And while PCA isn't commonly used, we explored it and found it to be super useful. It just is the graph and the score is just a nice quick visual of if I were to use the synthetic data in a downstream task, is it going to work for me or not? Because it is what most people do when they start a machine learning task. They'll visualize the data with PPCA, sometimes they'll reduce the dimensions with PCA. So if we're not holding on to that PCA in the synthetic data, then you're going to have trouble. So that's how we came up with the three and we're always open to new metrics. We'll always constantly researching, seeing what might be a nice addition to add to the report, but these three covered the basic bases for us and so that's what we started with.

Speaker 3 (17:28):

Yeah, it was adding that. There are a lot of metrics that are really specific to the downstream task. So if you task is purely statistical, you only care about correlations, you might want to focus on that and have maybe a little bit more rigor in comparing correlations or maybe you're interested in a specific pairwise correlation or something like that. And so those are very specific metrics, things like specific machine learning models. If you're looking at time series data, maybe you're interested in it arima model or something like that. And those are very specific, and so what we have here is something that applies to a wide variety of data sets and a wide variety of types that you find in tabula data.

Speaker 2 (18:16):

Right, wide variety of end use cases.

Speaker 1 (18:21):

Fantastic. So another random question that I have, these are mason's questions. I get to ask questions. I'm looking at just this synthetic quality score and they're all relatively close to each other, 88, 90, 93. Are there cases where maybe your field correlation score is in the nineties, but your deep structure stability is like 20? What would cause, are these scores relatively dependent on each other or are they completely independent? Could it be possible for them to be very heavily skewed across the three independent scores?

Speaker 2 (18:56):

They can be skewed that I've seen that happen many times depending on whether the config you chose to run maybe wasn't the best for this dataset, but there is some dependence between the three. So when Lipica shows us the formula for how we combine these scores into one overall score score, you'll note that we do weight them a little differently because they do overlap a little bit with what they're telling us about the quality of the data.

Speaker 1 (19:33):

Okay, fantastic. That sounds like it's a perfect segue to get into that part of the presentation. So I guess you'll show us the formula. It sounds so mysterious, even though it's on our blog, it's easy to find. Let me add your screen there and here we go.

Speaker 3 (19:49):

All right, so what I've pulled up is it is a little formula that one of our colleagues Andrew wrote up in a blog that he published on something that uses synthetic quality score, but it's cooler and slightly different subject matter. So we'll check that out if you're interested instead of sampling methods. But for now we're going to focus on the SQS formulas. So Amy talked through the three different components of SQS. There's deep structure stability, which is the PCA component. There's distribution, stability and correlations, and basically what we do is each of these different components have sort of raw scores associated with them. So Amy talked about using a Jensen Shannon divergence to calculate for example the score relation raw score. And so that's on one scale. The PCA score is on a different scale, the distribution score is on a different scale. So scale, we have a ton of raw scores, and what we do is for each of these individual scores, so you can see this left hand side here, we fit second auto polynomials and we do that using a collection of data sets that we've curated, so a large number of data sets, and we find the right or the most adequate coefficients for these, so these alphas and betas and deltas.

(21:15)
Then once we have scores, so each of these are on a zero to 100 scale or zero to one scale effectively the same thing. We take those and we combine them into the SQS score. So again, like Amy said, this is a weighted average of the different scores because yes, there are relationships between what the PCA score is, the distribution and the correlation. And so we combine that all into one score that goes from zero to 100, and it was pretty consistent. So if you had two totally different use cases, two totally different data sets, and you landed up with similar PCA distribution and correlation scores, you'd end up with a similar synthetic quality score. And that's kind of awesome because you get to compare across different use cases across different data sets. You can compare SQS and learn something about those data sets.

Speaker 5 (22:13):

Cool.

Speaker 1 (22:15):

Fantastic. The formula scares me. Anytime I see Greek letters in a mathematical formula, I immediately get terrified and I run away as fast as I can.

Speaker 3 (22:23):

It's just polynomials, just polynomials

Speaker 1 (22:28):

Bad words, all bad, bad words. I'm getting weird flashbacks to gamma distribution and advanced probability and stuff. I remember taking that in college and I was like, oh, it was so cool because the math you can do with probability is so cool. It's stuff I was interested in, but then I had to do the math and I was like, oh no, I don't want to do this. So yeah, that's pretty good. Okay, so there's one other question that I was going to just ask. This is more of an open-ended question. So we have the synthetic quality score. If you're a user who just generated their first set of synthetic data and now they're looking at this synthetic quality score, how should they interpret that score? What is the use case? What are some of the use cases behind it or how can your beginner user get the most out of your synthetic quality score from the time of first execution?

Speaker 2 (23:24):

Well, I would say going back to when I showed you at the top of the synthetic quality report, if you do that drop down, we list a whole bunch of use cases, whether it's machine learning or pre-production testing, and you should see whether your score is mapped to that use case. And if it is, then you're golden. So remember that when you're doing data augmentation for a statistical or more usually machine learning or you're doing some other data science or machine learning analysis on your data, you really need to have an excellent or a good score. You really need to have maintained those important statistical characteristics in the synthetic data. But other use cases, so pre-production testing, demoing, things like that, it's not so critical that the statistical integrity was maintained. So you could have excellent, good, or even moderate would be just fine. And so if you're getting poor or very poor, that's when we point you to our tips on how you might want to improve your sense report. That's what you should first do is look at your score and see if it maps to your particular use case.

Speaker 1 (25:01):

Awesome. Yeah, I was literally just trying to pull that up. Thank you, Lipka. Great. Let me look and see if we have any other, I have any questions in chat yet, so I have something in my notes here. I don't know exactly what means sometimes I should take better notes for myself. It says, just so distinguishing between high fidelity versus low fidelity data. Is that something that the quality score can help with or does that even make any sense, or did I make a weird note now? I don't even,

Speaker 3 (25:37):

Yeah, I think certainly. So I mean fidelity is right. How well are these distributions mapped? I guess maybe to take a step back even, right, what we try to do with synthetic data, at least our synthetic data models at Gretel, is to learn sort of the underlying distribution of the dataset that you're trying to create a synthetic version of. And so if you compare or if you are able to well understand, or if the model is able to well understand and well recreate what that distribution is and you generate synthetic data from such a model, chances are you'd have a high fidelity data set. There's a lot of stuff that we do to ensure that also in addition to just using the model and relying on the model, but ideally if you have, for example, the data quality score of a 90, I would feel quite comfortable going and trying that out on whatever my downstream task is.

(26:35)
So this is the, maybe it's not the specific example, but somewhere along the way we looked at the US adult income data set from UCI and there the goal is to predict whether an individual's income is over 50 K or under 50 K. And so there I might take my synthetic data set and say, okay, let me try with my synthetic data set instead. And I would have reasonably high confidence that any classifier that I trained on the original data set, whatever the quality was there, I may well be able to maintain that quality or performance of my classifier with the synthetic set. I may also choose to augment that data set with my synthetic set. So it really depends, but with a score of 90, like we see here, we saw in Amy's example, I'd feel quite confident doing that. Now, on the other hand, if I had a score of maybe 60 or even 50 and I was simply looking to augment my training data with some examples from the minority class in that dataset, I think there's a slight class imbalance and there are fewer examples of folks with higher income. And so I might choose, even if I did have poor synthetic quality to take some samples from the minority class from my synthetic data and augment my training set. So it really obviously depends on the use case. Fidelity of course matters to different degrees depending on your use case. But for something like that, I'd go forth and try even with a synthetic quality score of 60.

Speaker 1 (28:15):

Interesting. Oh, go ahead Amy.

Speaker 2 (28:18):

And in fact, we did write a blog where we took a set maybe 10 or so very popular Kaggle data sets, and one among them was this US adult census income dataset and we that each of these data sets had prediction tasks to them and we ran them through a whole suite of different types of classification models and found that we do as well on the synthetic data as we do on the training data.

Speaker 1 (28:55):

Awesome. Okay, so that makes you actually, now I have another question that I have. So you said 90 90 is a good number, so I guess excellent. So should be people be focusing on excellent because there are some people out there who are going to be perfectionists and they're going to be like, I need to have a hundred percent quality score, or I need to have a 99. Is there any value, say we had that synthetic data score of 90, is there any value in really trying to chase theoretical perfect, if I have 90, should I try to achieve 95 or 96, or do you feel like that might almost be more effort than it's worth?

Speaker 2 (29:31):

I think it's more effort than it's worth. I think you can see in this particular blog that we brought up that the SQS scores were not all 90. They ranged from 70 to 90 or something like that, and we still maintained the same F1 or accuracy score in the prediction task with the synthetic data as we did with the Trane. So I think if you've got a score of 80 up or even 60 up, go ahead and try your downstream task. You're probably fine.

Speaker 1 (30:13):

Okay. So I guess that kind of leads into one of the questions that we had earlier, which was does poor quality always mean poor performance on downstream tasks? Which maybe if you said even if 60, I mean some people would probably perceive 60 as bad, where we're all trained by the academic system that 60 is a bad number, but does poor quality always mean a poor performance?

Speaker 3 (30:43):

My experience has been no. Often I'll use synthetic sense to augment my original training data, and then often I'll use synthetic sets standalone, and usually in the first case I'm happy to even use synthetic data that came from a model that had a quality score of 60 or 50. I'm happy to do that because it sort of cuts the additional time that it takes to generate more synthetic data or do a better model or all that hassle. I can just get to the point pretty quickly, but I guess one thing to mention also thinking about your last question was if we're going to keep trying to improve our quality score and let's say the perfectionist and you and me and Amy was like, we want a quality score of a hundred. Well, a really simple way of getting a quality score of a hundred, right? It's to memorize your training data. And that completely defeats the point of using synthetic data and using a synthetic data model. So chasing perfection might mean actually chasing memorization or chasing something that compromises the privacy that we're hoping to gain by using synthetic data. So not always a worthwhile goal to chase 100, but 90 certainly, I mean, I'm super happy with 90.

Speaker 1 (32:09):

I was going to ask that. Yeah, I was going to ask. If you get closer to a hundred, does that potentially lead to, it's almost just the original data again, and it could lead to people being able to perform some sort of identification attacks on it or something?

Speaker 2 (32:25):

We never ever see a score of a hundred.

Speaker 1 (32:28):

Oh, that's good to know. That's awesome. I've never, okay. What is the highest score you think you've ever seen?

Speaker 2 (32:34):

Oh, 96, 98.

Speaker 3 (32:38):

Yeah, I think I say 90.

Speaker 2 (32:40):

It's important to note that our one to 100 scale for SQS is different than the educational grading system. So for example, 80 to a hundred we consider an A, excellent versus academically it would be 90 to a hundred, and then 60 to 80 we would consider a B. Good. And so it's a little bit different than the normal grading system.

Speaker 1 (33:06):

I wish you would've graded at my school, Amy. I would've been a straight A student the whole time.

(33:14)
Oh my goodness. Well, that's good to know. I know that we've never explicitly said it's like the academic system, but people when they see a scale of a hundred, they just get this flashback of school and stuff, so that's fantastic. Cool. So another question we have here, is there any interesting research happening with regards to synthetic quality? I know y'all said y'all do a lot of research when y'all were determining how to do all of this, but is there anything interesting happening right now or any just fun things that have got y'all excited about maybe the next steps of synthetic quality in general or here at Gretel?

Speaker 3 (33:52):

There's so much fun stuff out there. I mean, the great thing about the research community is that someone, one or the other is always publishing about important things. And as synthetic data gains traction, we have more folks thinking about synthetic quality. Now there's a lot in specific fields, so good quality medical data looks really different from good quality sensor data from autonomous vehicles. So you might find different research groups doing different strains of research on synthetic quality, but there's some fun ones. One that our team has been talking about a little bit is this paper that I have filled up, it's called How Faithful Is Your Synthetic Data, some really, really fun results where they talk about how we can look at fidelity diversity and generalization of synthetic data and how that compares to the training data, the original data, and how that might determine good quality versus bad quality.

(34:58)
And they do this with a lot of cool math, a lot of topology, and basically the idea is for each synthetic sample, can you determine if it's a good sample versus how we've been thinking about this, which is for a given synthetic data set, comparing that to the training set, how good is that synthetic set? So that's a really exciting research paper that's pretty recently updated. There's also another one that's been out for a while. It's called general and specific utility Measures for synthetic data. And here they talk a lot about BMSC. So really comparing mean square between square synthetic set and original training set. I think this is a lot more sort of in line with the interests of the Royal Statistical Society. So of the statisticians out there who are using synthetic data for really statistical models and inference, I think a lot of the things in here make a lot of sense using propensity scores or propensity means square arrow as a metric and building metrics off that.

(36:10)
So there's a lot of fun stuff happening. There has been a lot of research over the last few years, but certainly things that we're excited about and really when we look at a new research paper, a lot of what we think is, Hey, can we use this? But also in the framework of this research paper, where do we land? Are meeting the criteria? Where do we fall short? Is there something beyond just for the specific research paper, if we were to use it in Gretel evaluate or in our synthetic quality report, would it apply broadly to the data sets that we see? Which obviously covers a lot of different industries, lots of different shapes and sizes and types of data. So there's a little bit more critical thinking and then sort of application to the things that we see our users wanting. And that's a benefit too.

Speaker 1 (37:11):

I couldn't find the mute button there for a second. Awesome. Well, I think what leads us with our last question, which I will share my screen here for, and the last question is, can I use Gretel evaluate with synthetic data that's not generated with Gretel?

Speaker 2 (37:28):

So the answer to that is absolutely you can enter any two data sets you want, right in the demo I did, I did it synthetic data created by Gretel, but you could have it created by anyone to see how your particular model did. Yep. It's open to any two data sets.

Speaker 1 (37:55):

Fantastic. I've dropped the link in the chat, but this blog is the blog that is announcing GRETEL evaluate. If you're looking for the notebook that Amy demoed, it's right here in the blog. You can just come here and immediately grab it and run your own Gretel evaluate. One thing I'm always really happy about with Gretel is I think our SDKs are really simple, straightforward SDKs to use. So I think it'll be really, I don't think you'll have too much of a difficult time and you can start playing around generating your own, using Gretel, generate synthetic data and then measuring it or bringing in your own data and seeing what the chance is now. Okay. So have a random question, and this could be completely not useless or not true, but could you use this to say you had two data sets that were similar, they were not synthetic copies of each other. What happens if you were to run that through Gretel, evaluate? Would these, I mean it probably wouldn't get a really great score, but I don't know, this is me just spitballing right now. I'm making it up. But I had two data sets that I look at. These are same fields, very similar quality. Is there any value of evaluating two nonsynthetic data sets together against each other using this metric? Or is that just Mason making up stuff?

Speaker 2 (39:12):

Yeah, I could see that if you took a sample of the training data, one sample, and then you took a separate sample and you create, you wanted to see whether my two samples were statistically equivalent to each other, then you could run those two samples through Gretel, evaluate and make sure that the statistical integrity in both of them matches up. So you know that your sampling technique is appropriate.

Speaker 1 (39:43):

Oh, I didn't even think about that. So you could say you have a really large data set and instead of generating synthetic data, you just sample say 10,000 columns out of a billion column dataset. So you could then compare them and see if it's representative. Oh, that's so cool. I never even thought of that.

Speaker 2 (39:59):

Yeah.

Speaker 3 (40:00):

Another example, value, another example I have is you could use, in here, you could use Gretel evaluate to examine distribution shift. So kind of similar to what Amy said, but let's say you collected some data two years ago and you collected data every year since. You might want to see has the distribution of that data changed? So it could be something like people's preferences or your favorite survey that's done across many us, you might see if there's a change in the distribution and how people are responding and whatever you're trying to gauge through the survey. So that's another, I think credible evaluate.

Speaker 1 (40:44):

Oh, that's so cool. I love that. I'm going to try that now. I didn't even think about that. So awesome. Well, I think we're pretty much up with things to talk about this time. Do either of y'all have anything else you want to chat about just off the top of your head or are we good?

Speaker 2 (41:00):

I think I'm good.

Speaker 1 (41:02):

Awesome. Well, thank you so much. It's been great having Amy and Lipica here. I always get love getting to learn from really smart people. So I just got to sit back and watch and learn about, evaluate and all these fun things. So for any of you who are watching, oh, I have to pull something up. Give one second. We have a little giveaway that we always do. Where'd it go? Okay, we're going to make that a little bit bigger. So if you are watching today, as always, we're going to doing our swag kind of giveaway. That's not what I wanted. Make it bigger. There we go, bigger. So if you would like, and you've enjoyed this talk today and you want to learn more about synthetic data, you can either scan this QR code or go to the link that you see on your screen, grt.ai/evaluate-deep dive.

(41:51)
You have a week to do this, so if you watch this after the fact, you have until August 10th, 2022 to go in, give us your name if you want some swag, give us your shipping information, we'll ship you some stickers. And then if you want to hear more about Gretel, there's a little checkbox yes or no. Let us know if you want to hear more about it and we'll reach out to you and inform you about this yet. Or sorry, my brain just went into different words. We'll inform you more about Gretel and all the fun things. If you want to catch more of our Gretel live streams and you're on YouTube or LinkedIn, be sure to follow us. If you're on YouTube, make sure to ring the little bell to get the notifications. We will be going live again, I think twice more this month.

(42:30)
Next week I'll be back doing an introduction to synthetic data, so starting at the beginning and then we have another deep dive we'll do later at the month. And I don't think we've determined the topic on that one yet. But thank you everyone for tuning in. Thank you, Amy and Lipica so much for joining us. I'm looking forward to having you back in the future to learn a lot about a lot more cool stuff. And if y'all ever any topics y'all want to talk about, please let me know and we'll chat with them. But thank you so much and thank you everyone, and we'll see you next time.

Speaker 2 (42:58):

Thanks, Mason.

Video: Gretel Evaluate & Synthetic Data Quality Scoring

Video description

More Videos

Generating differentially private synthetic text using Gretel GPT

Anonymize Financial Data with a Fine-Tuned SLM

Video - Generate synthetic data for training LLMs and SLMs

Read the blog post

Transcription