Our Services
Collection Design
Our capture engineers collaborate closely with clients to understand the goal of the testing and capture. This relationship allows us to design a high-quality and repeatable process that will achieve the project’s purpose. Early samples and quickly-assimilated feedback ensure that the results will be spot-on with very little unusable data.
When designing the data collection, you need to maintain balance between naturalistic and controlled data. If your data is too natural, you may end up with too much noise to sufficiently train or test an algorithm. But if the data is too controlled, it may not reflect your device’s actual use cases.
Choosing the right environment depends on understanding your technology and the objectives of the client engineering team. On the other hand, you may want to collect data in a controlled environment in order to simulate specific conditions.
Participant Recruitment
Data collection projects may have specific demographic requirements. To gather this data may require hundreds or thousands of participants.
Languages, accents, ethnicities, and age ranges are some examples of specific data points you need to account for.
Finding enough participants that match your demographics can be a challenge. Robot Monster can bring a pre-cultivated network of participants for any type of study.
Environment
Field Data collection requires both controlled and authentic environments. A controlled environment is used to accurately measure and collect the necessary data. An authentic environment is used to reflect the environment where operation is expected to take place.
Finding and securing the environments most suited to the needs of the project while ensuring naturally occurring variation is a service that Robot Monster offers.
Robot Monster handles all of the arrangements, logistics and release paperwork required for the usage of the environments. Sometimes this is securing rental properties. Sometimes this is building custom environments to meet the project needs.
Field Data Collection
Robot Monster can simulate testing environments as needed or locate a natural environment that meets the needs and requirements of the test. Robot Monster’s extensive testing studio is suitable to simulate any home environment for focus group testing and data capture. Variations on light, dimensions, or audio are all possible to simulate specific conditions needed.
Robot Monster also has the ability to go out on location to participant environments for greater variation as well as to test in actual home environments. Robot monster also has a private lab and studio for collecting field data in a more controlled environment at our San Francisco location. This allows us to balance the data capture between precise measuring and control within an environment and the authenticity of actual use-case environments for a more complete and varied field data capture dataset.
Robot Monster collects three different types of field data primarily for machine learning applications.
Training Data
Training data is used to fit the initial machine learning model.
This is the most basic form of information that we gather, process, and give back to our clients. Client AI developers turn this data into their algorithms to detect patterns and learn from the data. This field data helps to build the baseline functionality of the product.
An example of field training data would be recordings of people using specific voice commands collected for a speech recognition device. You can then transcribe (for natural language collections) and annotate the data before feeding it into the algorithm.
Validation Data
Validation data is used to test the model throughout the training process.
When developing machine learning models, you would typically collect validation data alongside the initial dataset. A portion of the original dataset is set aside from the model training phase and used as validation data to fine-tune model parameters.
To use a human example: imagine a baby trying to learn what a circle is. You can train the baby with examples of circles cut out on pieces of paper, and the baby will eventually come to recognize those circles.
But how can you be sure the baby actually knows what a circle is?
The only way to know the baby can actually identify the concept of a circle is to test with new circles the baby has never seen before. Only then can we be sure the baby understands the round shape as the important variable, as opposed to the colors or type of paper you used.
These new circles are the validation data: a similar dataset to the original training data, but specific instances that haven’t been seen yet.
Testing Data
Testing data is used to evaluate the final performance of a model.
Whereas validation data is used to adjust your model, testing data is used to evaluate the overall quality of your model, or to compare its performance to other potential models.
As is the case with validation data, you can collect data for training and testing concurrently.
Challenges of Field Data Collection
As with any project, there are lots of moving pieces to manage when it comes to field data collection.
It’s far more cost-effective to outsource this field data collection than to discover and work through all of these challenges by yourself.
Field data collection is best-suited for projects with specific environmental requirements, such as specific acoustics for sound recordings or specific visual environment for images or videos. If the collection requires specific hardware, a field collection will be the way to go.
Recording people from a sound booth wouldn’t provide artifacts like car horns or traffic noise that may take place when a real person is trying to use the technology—so it’s important to build those artifacts into the training data.
Collecting Safely
COVID-19 safety precautions for all data collection ensures the safety of both participants and the Robot Monster Team. Equipment is disinfected after each location visit and all team members were appropriate PPE for the entirety of the location capture.
Privacy Protection
NIST Security processes and protocols to safeguard the privacy information related to the participants in the capture is essential. Robot Monster ensures that NIST standard protocols are always being followed so that a participant’s information is never put at risk. All team members are trained and multiple security measures are followed to ensure this is always true.